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ABSTRACT 

A project to develop a measure of English language 
proficiency (MELP) for use in a national survey of income and 
education, to estimate the number of people of limited English 
proficiency, is reported. The preferred form of the instrument was a 
short series of questions to be asked by an interviewer and answered 
by an adult member of a household. Principle activities in the 
project were: (1) development of possible MELP questions; and (2) 
criterion instruments against which to validate them; (3) 
f ield~testing of MELP questions in several ethnic groups; (4) 
analysis of resulting data to select the best questions for survey 
use; (5) derivation of scoring keys to translate any pattern of 
responses into language proficiency categorization; and (6) 
examination of methodological questions concerning surveys of 
populations whose native language is not English. A set of about 10 
questions were selected for inclusion in the MELP, based on high 
correlation with respondents' performances on a developed test of 
English proficiency and on school language prof ^'''^'ency 
classifications. Slightly different questions were chosen for adults 
and children. The report describes, in some detail, the procedures 
used, results obtained, and conclusions drawn. Appended materials 
include data from analyses, and other supplementary material. 
(MSE) 



''^ Reproductions supplied by EDRS are the best that can be made * 
* from the original document. * 



A Project to Develop a Measure of English Language Proficiency 



FINAL REPORT 



to the 



National Center for Education Statistics under Contract #300-75-0253 



Submitted by 
VJalter Stolz and Margaret Bruck 
June 15 > 1976 

Center for Applied Linguistics 
1611 North Kent Street 
Arlington, Virginia 22209 




2 



BEST COPY mmi 



Abstract 



This project was to develop a Measure of English Language Proficiency for use 
in the Survey of Income and Education (SIE) , a large scale national survey to 
estimate the number of people who are of Limited English-Speaking Ability (LESA) as 
defined in P.L. 93-380. The preferred form of the instrument was a short series of 
questions to be asked by an interviewer and answered by a single adult member of a 
household. Principal activities in this contract were (1) the development of pos- 
sible MELP questions and (2) criterion instruments against which to validate them. 
(3) Field-testing the MELP questions in various ethnic groups. (4) Analysis of the 
resulting data to select the "best'* MELP questions for use in the survey. (5) Deriva- 
tion of "scoring keys" by which to translate any pattern of responses to the MELP 
questions into a categorization of either LESA or non-LESA. (6) Examination of tX'70 
methodological questions relative to surveying populations whose native language 
is not English. 

(a) Can a single household respondent give accurate data about all other mem- 
bers of his household? 

(b) What differences exist betx^een data collected by monolingual English- 
speaking interviewers and those collected by bilingual interviewers who 
are members of the same ethnic group as the respondent? 

A set of approximately ten questions were chosen for inclusion in the MELP 
on the basis of their high correlations with respondents' performances on the devel- 
oped test of English proficiency and their school classifications as being either 
LESA or not. Slightly different sets of questions were chosen for adults and chil- 
dren. Discriminant functions were derived using the responses to these MELP ques- 
tions as discriminant functions yielded a classification accuracy of 75% - 807o when 
matched against the criteria in a population which had 58% LESA individuals. 



An alternative approach to scoring the HELP questions consisted of simply 
defining certain response patterns as LESA and all others as non-LESA. Such an 
approach yielded accuracies similar to those of the discriminant functions. 

It was found that responses given by a household respondent about others in 
his household were generally in agreement with those given by the individual him- 
self -- except for a slightly higher incidence of "don't know" responses on the 
part of the household respondent. Data collected by monolingual English interviewers 
were generally found to be indistinguishable from data collected by bilingual inter- 
viewers • 

A problem was discovered in the generalizability of any scoring formula derived 
in the field test to data collected in the SIE because of sampling differences be- 
tween the two studies. Thus, it was recommended that the scoring formulae be re- 
calibrated using a sub-sample of the SIE sample. 
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I> Introduction 



1, Background 

Section 731(c) of Title VII, the Bilingual Education Act, Section 105(a) of P.L. 
93-380, the Educational Amendments of 1974, mandates a report on the condition of 
bilingual education in the nation, including: 

\ 

I 

(1) "A national assessment of the educational needs of children and 
other persons with limited English-speaking ability and of the extent 
to \^hich such needs are being met from Federal, State and local efforts, 
including (A) not later than July 1, 1977, the results of a survey of 
the number of such children and persons in the States, and (B) a plan, 
including cost estimates, for extending programs of bilingual educa- 
tion and bilingual vocational and adult education programs to all such 
preschool and elementary school children and other persons of limited 
English-speaking ability, including a phased plan for the training of the 
necessary teachers and other educational personnel necessary for such 
purposes ; . . . .and 

(4) "An assessment of the number of teachers and other educational 
personnel needed to carry out programs of bilingual education under this 
title and those carried out under other programs for persons of limited 
English-speaking ability...." 

The survey mentioned above was assigned to the National Center for Educational 
Statistics and the decision was made to implement it in conjunction with another 
mandated survey, this one of the niunber of school aged children in poverty mandated 
in Section 822(A) of P.L. 93-380. This latter survey was assigned to the Secretary 
of Commerce (Bureau of the Census) and the "bilingual" survey was "piggy-backed" 



onto it. Concretely, this meant that both economic and language questions would 
be asked of a single very large sample of households. A basic sample of about 
155,000 households was designed so as to yield adequate accuracy for the economic 
data; and an additional sample of 35,000 households was chosen to supplement the 
main sample to assure a reasonable accuracy level for the English-speaking ability 
information in each state. This yielded a total sample of 190,000 households to 
be screened for language data. Finally, a number of questions about health and 
welfare programs were added to the questionnaire by the Office of the Secretary 
of HEW. The entire survey effort was named the "Survey of Income and Education" 
(SIE) and was scheduled to be conducted in Spring, 1976. In order to meet their 
own production schedule. Census set a deadline of October 3, 1975 for NCES to sub- 
mit to them the bilingual section of the SIE instrument. 

.In May, 1975, CAL received a letter from NCES requesting a proposal for 
research and development activities leading to such a measure of English language 
proficiency (MELP) . Accompanying the letter was a set of design specifications for 
the project which had been submitted to NCES on March 24, 1975 by Burton R. Fisher, 
Professor of Sociology of the University of Wisconsin. CAL's proposal was to be 
submitted to NCES no later than May 15. Both the letter and Fisher's design specif 
cations are appended to this report. (Appendix 1 & 2) 

2li Design Specifications for the MELP Instrument 

The MELP to be developed had to satisfy two broad criteria: first, it had 
to be an acceptable and valid measure of English proficiency as that construct is 
defined in the relevant legislation, and second, it had to be usable within the 
context of the SIE, a large-scale personal interview survey conducted in house- 
holds. Each of these criteria will be elaborated and their implications discussed 
below. 



The Construct of Limited English-Speaking Ability 

The objective of the survey vas to enumerate, in each state, persons vho 
vere to be considered of '^Limited English-Speaking Ability" (LESA) . Section 703 
of P.L. 93-380 provides a definition of LESA as follows: 

"Sec. 703. (a) The following definitions shall apply to the terms 

used in this title: 

"(1) The term 'limited English-speaking ability', when used with 
reference to an individual, means 

"(A) individuals who were not born in the United States or whose 
native language is a language other than English, and 
"(B) individuals vho come from environments where a language 
other than English is dominant, as further defined by the 
Commissioner by regulations; 
and, by reason thereof, have difficulty speaking and understanding 
instruction in the English language. 

"(2) The term 'native language', when used with reference to an 
individual of limited English-speaking ability, means the language nor- 
mally used by such individuals, or in the case of a child, the language 
normally used by the parents of the child." 

Fisher further defines the construct as follows: 

The phrase ". . . speaking and understanding instruction in the English 
language..." is interpretated to mean oral production (encoding in speech) 
and aural comprehension (decoding others' speech) in English. In the several 
education statutes, when reading and writing have been in mind the sophisti- 
cated statute drafters have seen fit to specify them directly; such specifica- 
tion is absent here. (Fisher, Pg. 3) 



The MELP to be developed for use in the survey needed to relate as directly 
as possible to the legislatively-defined LESA construct. Thus, the MELP was 
to have the following characteristics: 

1. MELP was to measure English proficiency only ; not proficiency 
in any other language nor language dominance. 

2. It did not need to measure reading and writing skills -- nor 

could it assume them to be present. ^ 

1 

3. It had to be targeted on speaking and comprehension skills 
as required in educational settings . 

The Population Relevant to the MELP . The legislative definition quoted above, when 
viewed from the perspective of a survey, implies a two-stage determination of limited 
English speakers. The first is to isolate the pool of potential LESA individuals 
as defined in the Bilingual Education Act. These are persons who were not born 
in the. U.S. or whose native language is not English or who come from an environ- 
ment where a language other than English is dominant. Satisfying at least one of 
the above conditions is necessary but not sufficient for a person to be classified 
as LESA. The second stage is to determine in the survey which of the potential 
LESAs actually would "have difficulty speaking and understanding instruction in the 
English language" because of their non -English background. Thus, the SIE was 
pictured as containing a set of "screening items" which would determine whether a 
person qualified as a potential LESA individual (i.e. had a background involving 
a non-English language). If so, then the MELP was to be obtained for that person, 
and if not, the MELP part of the SIE would be skipped for that person. Fisher says 

of the screening questions: 

The formulation of these "screening" questions is not a simple matter 
at all, and there is considerable controversy as to the nature of language 
questions in Census work. (See Lieberson, 1966, and others.) Under these 
circumstances, it would be highly desirable that this set of questions be 



prepared by the R&D contractor in close association vith Census people, 
(p. 2) 

As a pre-test of the screening questions, NCES added a ''Survey of Languages" to 
the July, 1975 Current Population Survey a monthly national survey of about 
45,000 households taken by the Census Bureau for the Bureau of Labor Statistics. 

Those questions concentrated on probing for languages other than English present 

\ 

in the household and the native language backgrounds and ethnic origins of the 
household members. Thus, our project's primary responsibility v?as to develop the 
instrument to be used in the second stage of LESA identification; however, the first 
stage screening questions were also clearly a matter of importance to us. 

With respect to the range of ages that the instrument must cover. Fisher 
concludes : 

Other references in P.L. 93-380 (to preschool education; to auxiliary 
and supplementary programs for parents of LESA pupils; to elementary and 
secondary education; to bilingual education under the Adult, Vocational and 
Higher Education Acts), and the language of Sec. 731 (c) mandating this 
survey make it clear that the "individual" referred to above may be of any 
age. However, individuals aged 5-17 seem to be of special interest, (p. 2) . 

Constraints as to the Form of the MELP 

Fisher was quite specific in characterizing the constraints that the necess- 
ities of the Census Bureau imposed on the form of the MELP: 

Census people say that if measurement of LESA is to be carried out in 
the Census survey, at least four constraints must be observed. 

a. "Testing" in any overt form, identifiable by respondents as such, 
is definitely excluded; this applies especially to "paper-and-pencil" tests. 



This places a limit on the kinds of response-eliciting stimuli which can 
be used to get at LESA^ 

b* Also categorically excluded is electronic recording of what the 
respondent says, for later analysis and coding. This places a limit on the 
kinds of responses to be recorded and the locus of assessment of these re- 
sponses • 

\ 

I 

£• A third explicit constraint: LESA measurement procedures must 
not break rapport during the interview, must fit "naturally" into the con- 
text and content of a CPS-like interview (face-to-face or via telephone), 
and must be within the capacity of its usual CPS and CPS-like interviewers, 
(On the whole, the latter are women 35 - 40 years of age, with a high 
school education,) The procedures must not disrupt them , 

d. The strong preference of the Census staff is for as simple a mea- 
sure as is feasible, with a small series of direct questions, answerable by 
the usual respondent for the household about all of the other members of 
the household* (In about 60% of CPS interviews, this is the mother,) That 
is, the preference is for enumeration of the household members, without 
sampling within the household to select the actual respondents. 

This is a strong Census preference, not an absolute requirement. 
Whether this preference can be gratified, given the need for an adequate 
measure of LESA (a key NCES requirement), is an empirical question to 
be answered in the course of R & D work. (p. 1) 

Acceptability of the MELP , NCES recognized that if the results of the survey 

were to be useful to the Congress, they must have the support of a number of 
concerned constituencies; thus the measure itself must also be accepted as viable 



by those constituencies. They included at least : the various non-English speaking 
minority group organizations, the educational community, and the research community. 
Therefore, a vital requirement of the project from the beginning vas to obtain 
meaningful input and response from all interested parties at all stages of tha vork. 

3. Design Specifications for the Research and Development Effort . 

In broadest outline, the project had two objectives. One vas to pick the best 
MELP possible from among the alternatives vhich conformed to the specifications out- 
lined above, and the other vas to gather validity information to indicate the in- 
strument's strengths and weaknesses. Given the very brief time schedule, it was 
clear from the beginning that both objectives had to be pursued more or less simul- 
taneous ly. 

Alternative Forms of the MELP - In Chapter II of this report the various approaches 
to language proficiency assessment will be considered in detail, but it is appro- 
priate here to at least outline the range of techniques available. 

Fisher discusses several kinds of MELPs that might fit Census* specifications.* 
One is simply to ask the Household Respondent about the English proficiency of each 
individual in the household in a very direct way. Such questions might involve 
direct ratings of proficiency as well as information about the situations in which 
each person normally uses English and his history of contact with the language. 
What literature does exist on this topic indicates that the answers to such questions 
may be highly correlated with more conventional measures (tests) of English pro- 
ficiency (cf. Scott, 1973; Bowen, 1974; Capco and Tucker, 1970; and Fishman, Cooper, 
and Ma, 1971), 

A second approach discussed by Fisher that the interviewer assesses the individ- 
ual's proficiency on the basis of his behavior in the interview. Given the ban by 

In this report the term MELP will be reserved for indicating an instrument for 
identifying LESA individuals within the context of the SIE . 



Census on testing or tape recording in the interview situation, this boils down to 
the interviewer making a rating of the respondent's English proficiency as dis- 
played in the course of the interview or scoring the presence or absence of specific 
linguistic features in the respondent's speech. Fisher puts it this way: 

If direct questions about how well an individual speaks and how ^»ell 

an individual understands English, put to that individual or to someone 

\ 

else about him, yield unsatisfactory MELP data, there is an alternative 
approach. The individual's speaking and understanding behaviors may be 
observed during the course of the interview itself, in response to ques- 
tions which at least overtly do not appear to attempt to elicit either a 
range of language* behaviors or an assessment of language behavior by the 
respondent . • . The interviewer may be trained to record and assess/rate 
behaviors he has been cued to watch for on forms developed by the R&D work 
on MELP. This is not unusual procedure in good psychological and social 
research and in assessment work in organizations. People without previous 
expertise and special qualifications have been successfully trained to make 
reliable and accurate reports and assessments of behaviors during group 
interactions and individual performances, in field and in laboratory situa- 
tions . (p • 3) 

A serious implication of this approach would be that the interviewer would have 
to talk with each person who was rated • This would undoubtedly call for some sort 
of within-household sampling and a significant reduction in the total number of 
individuals for which LESA and non-LESA categorizations could be obtained because 
of the greater cost of directly interviewing more than one respondent within a 
household, 

I - 8 



Criterion Instruments - But the needs of this project extended beyot?d instruments 
which could possible qualify as SIE MSLPs since a primary purpose of the R 6c D effort 
was to validate such a HELP, and that implies validating it against s^ome other 
instrument presumably a more direct, accurate, or widely accepted measure — 
which could serve as a criterion during field testing. While such instruments did 
not have as an absolute requirement the restrictions on form imposed by the Cen- 
sus Bureau, (since they were to be used only in our field test) there were severe 
logistical constraints on what could be used because of the scope and time schedule 
of the field test activities* In particular, since the objective of any field test 
would be to try out an instrument under conditions similar to those of its eventual 
use, the field test had to be household-based, and thus the criterion measure (s) 
had to be usable in a household setting. This would seem to eliminate assessment 
procedures involving costly and/or delicate equipment. Also, the criterion measures 
had to be applicable to people of all ages and from all ethnic-linguistic groups. 
None of the measures could assume reading or writing skills on the part of the 
respondent. Given all of these constraints, criterion instruments had to measure 
as directly as possible language functions necessary for success in educational 
settings . 

Validation - Fisher offers the following discussion of validation vis a vis educa- 
tional criteria: 

(a) On validity; MELP is to measure what it is intended to measure 
the characteristics and relative proficiency of ''speaking and understand- 
ing instruction in the English language," which make a difference or could 
make a difference in the individual's progress in a course of education 
or training. How "limited" ESA is, for present purposes, is to be referred 
against the language performance of individuals whose ESAs are seen by the 
schools as barriers of varying strength to effective learning, when instruc- 
tion is in English. t - q 
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amd thus elicits little agreement from specialists about what is the single 
••best" approach, then a logical reference is to those actually, making assessments 
in a routine way, however it is being done. If it is unclear what is the best 
approach, then a viable objective is to simulate the nost enlightened among the 
currently used practices. Second, since administrative identifications are typi- 
cally used for making decisions among a small number of alternatives, they are 
usually procedures yielding discrete and often binary classifications. On the 
contrary, most non -administrative assessment instruments yield scores that are 
basically continuous in nature and do not lend themselves to making dichotomous 
classifications without considerable arbitrariness. Thus, school's administrative 
screening procedures for non -English speaking students were to play an important 
part in this project. 

The general strategy employed with respect to validity was to focus on content 
validity and on concurrent validity. Content validity was addressed first by re- 
cruiting a staff with expertise in test development and linguistics and who also 
were drawn from a number of ethnic-linguistic groups. Second, we asked a number 
of specialists in the areas of language and language testing who were not other- 
wise associated with the project to comment on the adequacy of both criterion 
measures and possible MELPs . Third, CAL convened a large board of "Language 
Group Representatives" to criticize early versions of all instruments and to make 
suggestions about how they could be improved to be more "culture-fair" relative to 
each representative's group. 

Concurrent validity was obtained by eliciting data from field test respondents 
on several "criterion" measures of English proficiency, each representing a par- 
ticular approach to language assessment. (As it will be seen, at least as much 
effort initially went into the development of appropriate criterion instruments as 
went into the development of the MELP itself.) 
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The distinction between a valid measure and an accurate estimator > Although find- 
ing a valid HELP was an important objective of the project, the overriding objective 
of the MELP itself is important to keep in mind; that is, to accurately estimate the 
proportion of LESA individuals in the country > There is a crucial but subtle dif- 
ference between validating a measure of English language proficiency and construc- 
ting a procedure to estimate the proportion of limited English speakers in the 
country. When validating a new measure, one correlates it with "criterion" measures 
of the same construct measures which are already established or are more direct 
measures than the one to be validated. The important issue in validation is the 
extent to which the candidate measure tends to agree with (give the same answer as) 
the criterion measure (s) on a person by person basis across a large number of re- 
spondents . 

On the other hand, when constructing an estimator of a population parameter, 
it is most important that the estimator performs accurately at the level of the 
population . Thus, if the "true" proportion of LESA individuals in a given popula- 
tion is 0.2, the crucial property of a successful procedure for estimating that 
quantity, is that it gives a value of about 0.2. Whether or not the estimator 
classifies the "correct" 20 percent of the population as being LESA is a secondary 
consideration. For example, consider the following three tables involving mythical 
populations of 100 persons each. 

Table 2: 



Table 1: 

"true categorization" 



"true categorization" 



LESA 
LESA 20 



Candidate 
estimator 

-LESA 



Total 



20 



-LESA 
0 



Total 
20 



LESA 
LESA 0 



Candidate 
estimator" 



80 



80 



80 



100 



-LESA 



Total 



20 



20 



ERLC 



-LESA 
20 



Total 
20 



60 



80 



80 



100 
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Table 3: 



"true categorization" 

LESA -LESA Total 



LESA 



20 



20 



40 



Candidate 
estimator 



-LESA 



0 



60 



60 



Total 



20 



80 



100 



In Table 1, we have the best of all situations in that there is perfect agree- 
ment between the candidate estimator's categorization of the 100 individuals and 
their respective "true" categorizations. This estimator is thus a perfect estimator 
(it estimates the same percent of individuals to be LESA as is the true case), and 
it is also perfectly valid (every individual is assigned to the correct category)- 
In Table 2, ho\^ever, the estimator is an accurate estimator, since it gives the 
correct proportion of LESA persons in the population, but it is not particularly 
valid in the sense that it gives the correct categorization for only 60 of the 
people. Finally, in Table 3, the estimator is relatively valid giving the cor- 
rect classification for 80 of 100 people -- but a poor estimator since it over 
estimates the number of LESAs in the population by 1007o. While very high validity 
in the above sense is desirable, because it implies an accurate estimator, we must 
never forget what our ultimate objective is: to produce a good estimator of the 
proportion of LESA individuals in the nation. It is conceivable, then, that this 
project could find a MELP which is not highly valid as compared with available 
criterion measures -- all fallible to be sure yet which is a reasonably accurate 
estimator in the sense of closely matching the proportion of LESA individuals in a 
population as given by one or more of these criterion measures. The situation is 
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soaex^hat curious if the true proportion of LESAs in a population is quite small, 
say 10%. Then, a MELP which simply declared everyone to be non-LESA would be 90% 
valid; ho^^ever, it would be nonsense as an estimator of the proportion of LESAs* 
On the other hand, a MELP operating in such a population might display a validity 
of 907, or less while estimating the true proportion of LESAs quite closely* It 
would achieve this by falsely categorizing approximately equal numbers of LESA and 
non-LESA individuals. Generally speaking, we will evaluate all MELPs both in terns 
of validity and accuracy of estimation. The former will be indexed simply by the 
proportion of a sample categorized the same by both MELP and the criterion measure 
against which it is being compared. The latter will be indexed by a quantity to 
be called bias^* (see Chapter VII), which will be a function of the difference 
bet^^een the proportion of the sample identified as LESA by the MELP and that identi- 
fied as LESA by the criterion. 

4. Investigating the Accuracy of Data Given by the Household Respondeat . 

An important requirement of any MELP questions which were to fit Census' 
desired guidelines was that one adult in the household (the Household Respondent) 
had to Drovide accurate answers to the questions for every member of the household. 
This rnatter was investigated within our study in the following way: The interviewer 
was told to follow "standard" Census Bu eau interviewing procedures in the sense of 
beginning each household interview by locating a responsible adult who was willing 
and able to act as the Household Respondent and to provide information about another 
member of the household. While in the SIE questions would be asked about all others 
in the household, in the present study our focus was on only one designated individ- 
ual generally a child or adult whose name we had received from the local school* 
The procedure was then to ask all Census -type questions of the Household Respondent 
about this Desig^natcd Respondent , Then, if the Designated Respondent was an adult. 



the questions vere also asked of him directly about himself. Although our inter- 
viex^ers collected some questionnaire data directly from child Designated Respondents*, 
th^y x.;are not analyzed because Census did not plan to collect such information from 
children uccer any circumstances. Therefore, all' questionnaire data collected on 
children in this study can be considered to have been provided by an adult Household 
Respondent and thus qualifies as essentially ''proxy" data* On the other hand, every 
adult Designated Respondent in the study provided questionnaire data about himself, 
and this "first hand" data formed the basis of all analyses of adult MEILP data re- 
ported in Chapters V and VIII. In addition, proxy data were collected from a House- 
hold Respondent different from the adult Designated Respondeat when such an individ- 
ual was available at the time of the interview. In single-adult households, the 
adult Designated Respondent and the Household Respondent had to be the same person 
and thus proxy data were simply not available for that individual. The relationship 
of the proxy and first-hand data for adults is discussed in Chapter X. 

Of course, all criterion instruments were administered directly to the Desig- 
nated Respondent. 

5> " The Lano;uag;e Ability of the Interviewer 

Another concern about the accuracy of the data revolved around the fact that 
monolingual (English speaking) interviewers would inevitably be dealing with re- 
spondents whose English proficiency ranged from excellent to none. Ana, in addi- 
tion to the linguistic factor, there was also the cultural difference betx^yeen the 
monolingual, probably Anglo, interviewer and the ethnically distinct respondent. 
This difference could easily take its toll in refusals to be interviewed or on the 
rapport between the tx^o and thus influence the character of the data collected. In 
order to evaluate the severity of these problems, one component of the design of the 
field test was to compare the data collected by monolingual (English) interviewers 
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and bilingual interviewers whose native language and ethnic origin was that of the 
respondent's. This x^^as done by matching the assignments of monolingual with bi- 
lingual interviewers in each site through randomizing the names and addresses of the 
individuals they were to interview. Monolingual interviewers were given standard 
Census instructions, that is, if communication x^ith the Household Respondent was 
severly impeded by the respondent's lack of English proficiency, the interviewer was 
to find someone else in the household or neighborhood who could act as a translator. 
Bilingual interviewers were instructed to conduct their interviews in English when- 
ever possible and to refer to the native language only when absolutely necessary* 
They were encouraged to consult informally with one another in advance about the 
proper translations of various questions, but no formal, written translations of 
the questions were used. 

6. The Organization of this Report 

In subsequent parts of this report, the project's activities will be described 
in the following order: 

1. A review of the various approaches to measuring langxxage proficiency 
(Chapter II) . 

2. The instrument development activities both of possible >iELPs and var- 
ious criterion measures (Chapter III) . 

3. The field test in which the instruments were used in several ethnic- 
linguistic communities (Chapter IV). 

4. The selection of the MELP questions for recommendation to NXZ3 (Chapter V) . ' 

5. Analyses of the criterion measure data, particularly focusing on the re- 
lationships among the measures (Chapter VI) . 

6. Construction of scoring keys for children and adults by which individuals 
could be categorized as LESA or not on the basis of their responses to the 
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MELP questions (Chapters VII and VIII). 

7. Observations on generalizing the results of Chapters VII and VIII to 
deterziining LISA and non-LESA categorizations for individxxals surveyed in 
the S IE (Chapter IX) . 

8* Investigation of the validity of the MELP data provided by a Household 
Infonzant about other adult members of the household (Chapter X) ♦ * 
9. Investigation of "interviewer effects", comparing the data collected 
by interviewers who are from the same ethnic-linguistic cominunity as the 
respondent with data collected by monolingual English "Anglo" speakers. 
(Chapter XI) . 
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11^ Alternative Approaches to Language Assessment: 
1. Background 

In the past decade, the nature of language assessment has char.ged as a result 
of a shift of emphasis in current linguistic theory from s tructuralisn to function- 
alism. Through the 1960*s, the language tests reflected the vie^^-point of the 
structural linguists (cf. Chomsky, 1965): that language is grarxiar -based and can 
be divided into such subcomponents as phonology, syntax, and semantics. English 
language proficiency tests were constructed to measure the individual's knowledge 
of a number of these structures. 

In the late 1960*s, some linguists (e.g. Hymes, 1967, Labov, 1970) emphasized 
that knowing a language involved more than being able to conform to its rules of 
syntax, phonology, and vocabulary; it also included being able to use language in 
communication situations. The speaker had to demonstrate that he knew when to 
speak, to whom he should speak, where he should speak and how he should speak. 
Functional and communicative aspects of language were stressed. The individual's 
ability to appropriately express himself and make himself understood were examined. 
Test constructors emphasized the importance of collecting data in "natural" or 
contextually relevant situations. Instruments were developed to assess global 
communication skills in specific types of contexts (e.g. the classroom) rather 
than a number of specific grammatical, phonological, and semantic skills in a gener- 
alized or unspecified context. 

This drift in both theoretical and measurement emphases illustrates how ten- 
tative the linguist's hypotheses are about the nature of language. It is most 
important to recognize this tentativeness when evaluating the adequacy of language 
proficiency tests, since different test developers may have rather different con- 
ceptualizations about the nature of the phenomenon that they are attempting to 
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measure. For example, the Illinois Test of Psycholinguis tic Abilities (Kirk, 
McCarthy, and Kirk, 1968) was created within the context of Charles Osgood's theory 
of language. Within any other frame of reference e. g, most linguists* the 
test is of dubious validity. In the present case, a test may be reasonably valid 
to a person viewing language proficiency as tacit kno^^ledge of an isolated set of 
syntactic, phonological, and semantic rules, but it may be quite beside the^ point 
for someone viewing language as the ability to perform ^^appropriately" in a set 
of communication sitixatioas. Only after agreement is reached on what is to be 
measured can one set about evaluating the effectiveness of various measurement 
approaches . In terms of "validity", as the term is used by psychoraetricians , we 
have a situation where "experts" may not agree on the construct validity of a 
given instrument because they do not agree on the construct itself. Such a con- 
dition essentially precludes the existence of any universally accepted measure or 
test of the construct, and this is exactly what a review of the language testing 
literature shows, i. e.^that there is considerable disagreement among specialists 
about which of the hundreds of existing tests are "the best." Even within the 
slightly more restricted domain of educational settings there is still little con- 
sensxis on "the best" instrument. 

2. Criteria for Evaluating Tests 

Assuming, however, that some agreement can be reached about the nature of the 
phenomenon to be measured, it is useful to set up some criteria that an "ideal" 
measure of English proficiency should meet. We propose the following six criteria: 

1) Tne test should be a broad measure of English proficiency in the sense that 
it should measure productive (speaking) as well as receptive (listening) skills. 
For older children and adults it should also measure proficiency in reading and 
writing (a criterion not relevant to the present application). 
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2) The test should reflect differential proficiencies in different do-nains of 
use (e.g. horze, school, church, peer, adult, etc.). (Again, for the purpose of 
this project, the test need only be a measure of proficiency in one doniain: the 
school setting). 

3) The test should be reliable and valid, a universal requirenent for any 
test. It should have high construct, content, and face validity. 

4) The test should yield scores that are readily interpretable relative to the 
objectives of the testing. Usually this means that norms must be available for 
groups similar to those \^ith which the test is to be used. If the test has been 
constructed as criterion-referenced or performance -based, then nonas are not necess- 
ary^ provided that scores are interpreted as intended by the test constructor. In 
some applications, where all comparisons and interpretations of scores are done 
internally to the study (as in the present project), norms are not necessary because 
comparisons of persons inside the study are not being made with persons outside the 
study. 

5) The test should be easy to administer in a reliable fashion. 

6) The test should be easy to score in an unequivocal fashion. 

3. A Typology for Classifying Tests . 

In this chapter a number of English language proficiency tests vill be reviewed 
and evaluated relative to the six criteria of the previous section. Each of these 
tests is currently in use with adults and children from non -English or bilingual 
backgrounds. In order to facilitate this review, however, the tests vill be cast 
into a four-fold typology. As will become clear, tests which are members of the 
sane type tend to share similar strengths and weaknesses relative to the criteria. 
Thus, a nunber of important attributes of a test can often be identified simply 
by placing it in its appropriate category. 
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The four categories are actually the conjunction of two independent dimen- 
sions* These will be explained briefly and then more extensively as the tests 
themselves are dis cussed • 

The first dimension is labeled discrete-point vs ♦ integrative and refers to 
the assumptions and intents of the test constructor and the test user. A discrete- 
point test is one which attempts to analyze English proficiency into its atboic 
components and then test each of the components separately. This approach was 
typical of the structural linguists of the 1950*s and early 19&0's who believed 
that to test language proficiency one tested knowledge of the facts of the language 
e.g.^ syntactic rules, morphology, vocabulary, etc. The specific format of the test 
was important only in that it should facilitate revealing the knowledge and not 
impede it. (For example, the test format should not, in iteslf, place a heavy load 
on memory or call on large amounts of non-linguistic — and thus irrelevant know 
ledge and abilities, e.g., intelligence •) The crucial feature, though, of the dis- 
crete-point approach is the assumption that if one is "proficient" in knowing 
enough of the components of a language, he is proficient in the language. In a 
sense, a discrete- point test is a collection of mini-tests, each testing a separate 
sub-construct and fielding a profile and summary measure of langtiage proficiency. 

An integrative test is one which involves a task assumed to call upon a large 
range of the phenomena under examination. The degree to which that task is accom- 
plished becomes the score on the test. For example, taking dictation is considered 
by many specialists to involve a large range of linguistic skills, both receptive 
and productive. An integrative test then might be to dictate a passage to a respon 
dent and simply count the number of errors he made in his transcription. An inte- 
grative test is assumed to index the respondent's integrated English proficiency 
rather than the separate components of his proficiency. 

The second dimension deals with the relevance of the assessment situation to 
the behavior of interest, and it is called the direct -indirect dinension. A 



direct test or assessment is one \^hich samples directly from the behavior to be 
evaluated. For example, if one is interested in English proficiency in the class- 
room, a direct assessment \>?ould be to observe the respondent in his routine class- 
room activities and then in some way rate or score his performance in that situation* 
As the evaluation situation becomes more contrived and/or different from the situa- 
tion of interest, the test becomes more indirect. Notice the im.plicit assumption 
here is that the evaluation is not of traits or abilities or knowledge residing 
entirely within the respondent. Rather, the evaluation is of the individual's 
abilities to interact with his environment in specified classes of situations. This 
is a thoroughly appropriate assumption to make in the present project given the 
legislative definition of LESA as being "difficulty in speaking and understanding 
instruction" because of a non -English language background. 

Since directness is a joint property of a test and what it is meant to measure, 
a test is neither direct nor indirect in and of itself. It may be very direct when 
used to measure one sort of behavior and indirect when measuring another. Valid 
direct tests are face-valid and cons truct -valid while indirect tests must generally 
depend on the establishment of concurrent validity in order to be considered valid. 
Also, it is clear that the direct-indirect distinction is in fact a continuun and 
that tests are not direct or indirect in any absolute sense, but only more or 
less direct. 

Indirect-Discrete Point Tests . These tests can be sub-divided into two groups: 
standardized and non-standardized. 

Two examples of standardized discrete point indirect tests are: Test of English 
as a Foreign Language or TOEFL (ETS , 1975) and Michigan Test of Language Proficiency 
(Upshur, et^ al^, 1964). The Michigan test is designed to be a test of English 
language proficiency for adults enrolled in college and is composed of three 
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sections: graaimar, vocabulary, and reading comprehension* It measures such 
language facts as: word order, noun and pronoun forms, verb tenses, aodals , ellip- 
sis, prepositions, and idioms. 

The TOEFL was also designed to measure the English proficiency of foreign 
students applying for college admission into the U,S . It is composed of several 
sections: Listening Comprehension, English Structure, Vocabulary, Reading Coapre- 
hens ion and Writing Ability. Items on these subtests are designed to measure 
specific language facts. 

Many of the uns tandardized indirect discrete-point tests are pilot tests 
for which later refinement and standardization are planned. Three are discussed: 
Bilingual Syntax Measure (Burt, Dulay and Hernandez -Chavez , 1974), the MAT -SEA-GAL 
(Matluck and Matluck, 1975), and the Ilyin Oral Interview (Ilyin, 1972). 

The Bilingual Syntax Measure tests a child's (ages 4 to 9) ability to produce 
specific grammatical structures in English (or Spanish) which are supposedly impor- 
tant indicators of structural proficiency. The child is shown a picture, and is 
asked a specific question about it. The question is so phrased as to elicit a 
specific grammatical structure. 

The MAT-SEA-CAL was designed to measure a child^s ability to understand and 
produce distinctive characteristics of English. The three sections: Listening 
Comprehension, Sentence Repetition, and Structural Response test specific phono- 
logical, morphological, syntactic, and lexical items. 

The Ilyin Oral Interview is a test of oral English language proficiency for 
adults (from 13 years on). The examinee is asked to give complete statements in 
response to a series of questions based on a sequence of pictures. Answers are 
scored separately for information conveyed and grammatical elements* As in che 
other two tests the questions are structured so as to elicit specific graixiatical 
structures . 
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The above three tests have been classified as examples of "indirect tests"; 
in that while the language testing situation is probably closer to "real-life" 
than that of the standardized tests previously discussed, they do not represent 
or directly sample from naturalistic situations. That is, in normal discourse 
while we might ask people questions about pictures, we do not structure questions 
to elicit specific linguistic forms, nor do we ask a string of 28 consecutive 
questions. Thiis , these instruments are thoroughly "test-like" and bear little 
resemblence to normal dyadic interactions, even between students and teachers. 

The test constructors of the three example tests described above all state 
that norms, reliability and validity for these tests are forthcoming. 

One additional test (or technique) should be mentioned in this section: 
imitation tests. Here the task is for the examiner to say a specific sentence (one 
long enough so that the examinee can't memorize it) which the examinee then is to 
repeat verbatim* The rationale for this technique is that correct repetitions 
indicate underlying knowledge of the structure of the sentence. Although there is 
no single generally accepted imitation test, it is easy enough for a test-constructor 
to draw up and use a list of sentences which contain the important "langtiage facts." 
Examples of this approach are Naiman (1974) , Menyiik (1963) , and Natalicio and 
Williams (1970). 

How well do these types of tests meet the six criteria proposed for an "ideal" 
language proficiency test? First, the tests vary in terms of the range of language 
skills they assess. Some (TOEFL) assess reading, writing, and listening comprehen- 
sion, while others purport to test only oral skills (Ilyin, B.S.M.). However, there 
does not appear to be one test that measures all four language skills (speaking, 
understanding, reading and writing). Secondly, it appears that all these tests 
focus on one variety of language: formal standard English. 
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Third, while the standardized tests have norms and assessments of reliability 
and concurrent validity attached to then, Chey and all indirect discrete-point 
tests have recently been called into question because of the assumptions under- 
lying the-i. Critics take issue primarily with the assumption that language pro- 
ficiency is simply the tacit knowledge of a collection of "facts" about the lan- 
guage which can be tested for, one by one, outside of any context in which .the re- 
spondent would normally use the language. Clearly, both the concept of discrete- 
point testing and the indirect nature of most discrete-point tests are under attack. 
(For a summary of these criticisms, see Jones and Spolsky, 1975; Upshur, 1971.) 

The main advantages of indirect discrete point tests are that they are compara- 
tively easy to administer and score. 

Direct Pis Crete -Point Tests.. The main differences between this set of tests and 
those described in the previous section, are in the techniques used to elicit the 
individual's responses., Because these types of tasks attempt to elicit language 
in "natural" situations, the responses are usually strings of sentences, rather than 
single sentences or words . However, the tests are considered discrete-point in 
that analysis of the subsequent responses involves counting and analyzing specific 
structures which the test-constructor states are important subcomponents of language 
proficiency. Two examples of these tests, the Basic Inventory of Natural Language 
(Herbert, 1975) (BINL) and the Language Cognition Test, (Stemroler, 1975) are tests 
of productive skills for children. For the BINL, children are trained to talk to 
each other about pictures. After a number of such training sessions (for which the 
test constructor must do on-site workshops) the children's subsequent narratives 
are recorded and analyzed for such features as syntactic complexity, fluency, and 
sentence length. 
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The Language Cognition Test is similar to the BINL except that the child 
talks to an adult about a picture and some faniliar objects. The responses are 
recorded and later analyzed for: basic sentence types, transformations, verb con- 
structions, and adjective types. 

The disadvantages of these tests are that they only measure oral production; 
they have not been validated, or standardized, and there is no information on their 
reliability. While they may be easy to administer, the scoring procedures are quite 
lengthy and require some training of the scorer. Positively, these types of tests 
can be readily used to assess language in E^any domains. For example, one could 
construct the elicitation situation in such a way that the subject tells a story 
to his friend, or to his mother, or to his teacher etc.^ 

Direct-Integrative Tests . The procedure which bests demonstrates a direct inte- 
grative assessment of overall language proficiency (oral and v/ritten) is the For- 
eign Service Institute's oral interview and rating technique (FSI, 1963). Here 
the main emphasis is assessing how well a person can communicate in a language for 
particular purposes in given situations. Usxxally the respondent is brought in to 
converse for a half hour or so with two observers, at least one of whom is a native 
speaker of the language. The topics and the situations covered generally are chosen 
to be as similar to typical on-the-job situations as possible. The speaking test 
ends when the two interviewers are satisfied they have pinpointed the respondent's 
rating level. This usually occurs within 30 minutes (and frequently within 5 to 
10 minutes). The 9 point rating scale ranges from (1) which is defined as elem- 
entary proficiency to (5) which is native or bilingual proficiency. Each rating 
is well defined in terms of the level of language used. For example, the first 
level (Elementary Proficiency) is accompanied by the following description: 
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Elenentary Proficiency 



S-1: Able to satisfy routine travel needs and rninimuin courtesy require- 
nents. Can ask and answer questions on topics very familiar to hin; ^^ithin 
the scope of his very limited language experience can understand sir^.ple 
questions and statements, allowing for slo^^ed speech, repetition or para- 
phrase; speaking vocabulary inadequate to express anything but the r.ost 
elementary needs; errors in pronunciation and grarzmar are frequent, but 
can be understood by a native speaker used to dealing with foreigners 
attempting to speak his language; while topics which are "very familiar" 
and elementary needs vary considerably from individual to individual, any 
person at the S-1 level should be able to order a simple meal, ask for 
shelter or lodging, ask and give simple directions, make purchases, and 
tell time. 

R-1: Able to read some personal and place names, street signs, office 
and shop designations, numbers, and isolated words and phrases. Can recog- 
nize all the letters in the printed version of an alphabetic system and 
high-frequency elements of a syllabary or a character system. 

Other government agencies have further subdivided the skills and devised 
rating scales for listening and writing proficiency. 



Dealing specifically with the FSI oral interview, ho^^ well does it meet the 
criteria suggested above? 

1) The procedure can be used to assess the full range of an individual's oral 
skills . 

2) From the rating descriptions, it appears that many different domains of 
language use are being assessed (e.g. can order a meal, ask directions). However, 
it is unclear how well one can assess language ijise in a variety of domains in such 
a short time. 

3) The inter-rater reliability in the oral interview situation is very high 
(Clark, 1975). What is not known is whether the measured proficiency of the respon 
dent fluctuates from day to day. Thus he might receive a variety of ratings were h* 
retested on several consecutive days. Also, it should be emphasized that FSI main- 
tains extensive training and recalibration programs for its interviewers • Thus, 
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this high inter-rator reliability is quite costly. 

Xltere are no data on the predictive validity of the test (i*e.^ how well 
respondents actually perform "in real life" in a number of sociolinguistic contexts). 
Constructors of the test state that it is highly face valid; hav^ever, many have 
taken issue with the apparent "naturalness" of the testing situation (e.g.^ comments 
in Jones and Spolsky, 1975). It is important to keep in mind that because it is 
a testing situation (and not a tea-party) it can never be totally natural. Clearly, 
any time a person knows that his performance is being formally evaluated, the situa- 
tion becomes somewhat "unnatural" for him. 

Lastly while this procedure may be quite easy to administer, scoring tends to 
be difficult and expensive in terms of interviewer training time and sophistication • 

The Dailey Oral Language Facility test (Dailey, 1968) as adapted by Cohen (1975) 
is an attempt to adapt rating scale procedures for use with children. Here the 
children are asked to tell stories about different pictures which represent three 
different social domains (home, school, and neighborhood). The stories are then 
rated by two raters on a number of 5 point scales (e.g. ^general ability to communi- 
cate, fluency, grammar, pronunciation, rhythm, intonation). This test is similar 
to the BINL except that the analyses of the data are global. It is similar to the 
FSI procedure, except that the stimulus situation is more closely controlled. 

Generally, oral interview and rating techniques are not widely used outside 
government agencies for several reasons. The most important reason is that they 
are very expensive to maintain. As indicated above, FSI interviewers are highly 
trained specialists who are required to return frequently for retraining and re- 
calibration. Extensive research on language and attitudes has indicated that un- 
trained raters often make highly biased judger>ents about a person's language ability 
based on non-linguistic variables (e.g., sex, race, dress, etc.). A secondary reason 
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is that the use of such a technique in different language use situations (classroom, 
vocational) and v;ith different age groups would involve completely reformulating 
the interviev7 procedures and the criteria for evaluating an individual's perform- 
ance. Thus, the technique is expensive both co maintain and to initiate. In 
fairness to the approach, it must be admitted that ve do not yet knox^ the minimum 
amount of interviewer training \^hich is necessary to achieve reasonable reliability 
on various scales used in different contexts. The possibility certainly exists 
that acceptable results could be obtained in some situations and \^ith some groups 
using different, less costly training procedures then those used by FSI. Although 
the Dailey has not been thoroughly developed to date it may be a start in this 
direction. 

Indirect-Inte^rative Tests . These are tasks which do not have a high degree of 
face-validity, but purport to measure "global" language proficiency* 

One set of tests in this group are termed "reduced redundancy tests" (Spolsky, 
1971). The main rationale underlying these tests is that there is a great deal of 
redundancy in language which is particularly useful to the non-native speaker as 
he makes guesses about the meanings of utterances that he hears or reads. If this 
redundancy is removed, it should be much more difficult for him to continue to 
communicate* 

Redundancy can be removed in a number of ways* In the Cloze Test (Taylor, 
1953), redundancy is reduced in a reading task by deleting every nth word in a 
paragraph, and the respondent is required to supply the missing words. Scoring in- 
volves counting either the number of exactly supplied words or the number of con- 
textually acceptable responses. 

The correlation of this test with other tests of langxaage proficiency is quite 
high: .83 with the UCLA language proficiency test, .73 with the TOEFL listening 
com.prehension test (Darnell, 1970, Oiler, 1972). 
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Another test of reduced redundancy is the dictation test (\^Lth or \>?ithout 
noise). In the traditional dictation test (without noise) the person is read the 
dictation and he vrites it down (Gradman and Spolsky, 1975) • The number of errors 
are counted and subtracted from a base line score. Such a test was found to corre- 
late .94 with the UCIA English Language Proficiency test. It also correlates highly 
with the Cloze test (Oiler and Streiff, 1975). It is called a reduced redundancy 
test in that many of the cues used in natural situations are removed. If a person^s 
internal grainmar is incomplete, '*the kinds of hypotheses that he will make will 
deviate substantially from the actixal sequences of elements in the dictation.** 
Oiler mentions, as a example of this, the student who converted a phrase ^'Scientists 
from many nations" into "scientists' imaginations" (Oiler and Streiff, 1975). 

The reduced redundancy test with noise involves giving the student a number 
of sentences in the target language which have been masked by the introduction of 
white noise. (Gradman and Spolsky, 1975). The student attempts to write out, or 
repeat each sentence. This test has been validated against various tests: TOEFL 
(.75); TOEFL Listening Comprehension (.89), TOEFL Vocabulary (^85) and the Ilyin 
Oral Interview (.69). 

These reduced redundancy tests all share a common set of problems, as well as 
advantages. The tests are heavily dependent upon orthography (at least in their 
present forms), and as a result it seems unclear how directly they actually measure 
oral skills. The tests do not seem well suited for investigating language proficiency 
in various domains, since it appears difficult to construct these types of tests to 
measure a person's ability to communicate with a certain person in a specific setting. 
In most cases the tests seem fairly easy to administer and score. Perhaps the big- 
gest question associated with all integrative - indirect tests concerns their validity. 
Clark (1975) contends that the ultimate usefulness of such tests will rest on the 
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iriagnitudes of correlations between them and nore direct measures (specif ical?ty FSI 
type tests). Nevertheless, the evidence provided by concurrent validation with 
other relatively indirect measures plus their ability to be employed efficiently 
and economically is encouraging* 

Another category of indirect integrative measures includes v7ord naraing and 
word association tasks* Macnamara (1969) defines these as brief economic measures 
to assess undifferentiated degrees of bilingualism* Because these measures have 
typically been used to assess degree of bilingualism, they are usually administered 
in two languages. However, they also can be administered in one language as a 
test of general proficiency. They have been used to assess langioage usage in 
different domains (cf. Fishman, Cooper, and Ma, 1971) and are very easy to administer 
and score* Their validity will be discussed below. 

The last variety of integrative indirect tests to be discussed is that of 
self -report. Here the subject rates his own language proficiency. Depending upon 
how the interview questions are structured, he can be asked to rate his proficiency 
in a number of different domains or situations (church, school, in a restaurant, 
giving directions). The rating scale itself can be made up of any number of points 
with as much description or definition of each point as the test constructor cares 
to make. These scales have the advantage of being very easy to give and very easy 
to score. There are many unanswered questions about the utility of the rating 
scale, and the validity of the approach is controversial (see below). It is clear 
that young children cannot rate their own proficiency, and that parents* or teachers' 
ratings of children's proficiency might not be valid. For example, teachers' ratings 
could be influenced by attitudes and stereotypes about the child which are non- 
language related. We do not know how accurately a parent can rate his child's 
proficiency in a language if the parent does not see the child use the language and/ 
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or does not know the language himself. Also the ratings might be affected by such 
variables as humility, and social pressures to respond in certain V7ays . 

As noted above \>?e also have little information on the validity of these rating 
scales • Arsenian (as cited in Macnamara) cites validity estimates of about r=*80 
obtained by correlating a language background questionnaire (a series of questions 
about respondent's and family's language use in different situations) \^ith ratings 
of linguistic proficiency made by intervie^^^ers . Macnamara attempted to relate a 
series of indirect measures (language background, self-rating, word naming, reading 
speed, word detection and word completion) to a number of "more** direct and stan- 
dardized measures of language proficiency (Gates reading test, a listening compre- 
hension test, a story telling test)^^. He used the direct tests as criterion 
variables and the indirect tests as predictor variables. While he found that the 
language background questionnaire was not a good predictor of performance on the 
direct tests, the self-rating scales were powerful predictors. Macnamara had the 
subjects rate themselves on four different scales (reading, writing, speaking, lis- 
tening). However, in his analysis, he found that little accuracy was lost by 
combining the four ratings into one. Of all the indirect measures, he found that 
self-ratings of "speed of reading" was the most powerful predictor of bilingual 
skills, this however is probably due to the fact that many of the criterion tests 
involve this skill. Other indirect tasks contributed in less powerful ways to the 
prediction of the criterion tests. 

In our review of language proficiency tests we realize that we have not pro- 
vided an exhaustive list of all available measures. Rather we have attempted to 
sample and furnish a critique of those that are more commonly used and those which 
show promise of being good measures. 



* Macnamara was interested in assessment of bilingiaal proficiency and thus ad- 
ministered the above tests in English and French. He obtained difference scores on 
each test and correlated these among tests. However, his results are interesting 
for those concerned with the measure of language proficiency. 
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4> Language Assessment Instruments in the MELP Project 



Possible "^ELP Ir.s trinents , With respect to the MELP (that is, the instruoient used 
to identify LESA individuals in the SIE) , the t\^o most tenable approaches have 
already been mentioned in Chapter I: (1) A set of opinionnaire-type questions to 
be answered by a Household Respondent about the English proficiency, use patterns, 
and history of each member of the household, and (2) a direct rating or scoring 
system completed by the interviewer during the interview. The prohibition by 
Census of any obvious testing ruled out anything but these approaches. In the 
second option above, the rating and scoring procedures would have to be designed 
as essentially covert measurement. That is, the interviewer would assign a pro- 
ficiency score to the respondent without the respondent being aware that his English 
was explicitly being assessed. If the interviewer were to simply rate the respon- 
dent's proficiency in a way analogous to an FSI rating, it would qualify as inte- 
grative and relatively direct. It would be indirect only in the sense that the 
household interview situation does not obviously sample directly from language use 
rec uir anient s in instructional settings. However, if the interviewer were to ob- 
serve and code (perhaps on a checklist) a set of features as they occurred during 
the interview — e.g. various sentence types, verb tenses, dependent clauses, etc. 
the assessment would qualify as a discrete point direct test. Fisher discusses 

this approach as follows : 

Specialists in applied linguistics have knowledge of the coaponents 
and dimensions of phonology (accents, sounds, some dialect features), of 
lexicon, of S3mtax and of utterances to be used to characterize oral pro- 
duction and aural comprehension. (Parenthetically: Bilingual interviewers 
or non-verbal behavioral response indicators may be necessary, where an 
individual comprehends but does not speak English.) Applied linguists are 
aware of certain central "diagnostic'' linguistic features of adequate and 
inadeauate English language usage and comprehension. If they do not al- 
ready kno^^ which of these linguistic features are most highly correlated 
vith other features of English language usage, they can determine this 
empirically in R 6c D work at the educational sites. (The purpose of 
this is to shorten the list of language behaviors to be observed, for 
entering into an assessment of ELP made by trained interviewers. The aim 
is practical while maintaining a list of critical items long enough 
for ^IZLP reliability.) (p. 5) 
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We suspect that Fisher is overly confident of linguists' knowledge of lin- 
guistic features that are particularly diagnostic of overall proficiency. It is 
exactly that **knov? ledge*', as exemplified in discrete point tests, that has recently 
been called into question by Jones and Spolsky (1975). It is important to note 
that the rating and the scoring approaches were never thought of as anything but 
possible last-ditch, fall-back MELPs , to be considered only if ratings by a* House- 
hold Respondent proved a complete failure. They were considered as such because 
of their necessitating the interviewer to converse face-to-face with each individ- 
ual being given a LESA - non-LESA categorization. 

Possible Criterion Instrunents . With respect to possible criterion instruments -- 
i.e instruments to use as standards against which to develop and calibrate the 
MELP the restrictions as to form were somewhat less severe. 

Clearly, discrete point indirect tests were prime candidates for the following 
reasons : 

1. They are easy to administer and score. 

2. They need not involve paper and pencil. 

3. A number of them have been developed, all or parts 
of which might be usable. 

4. While m.ore controversy about their validity is present 
now than ever before, discrete point tests still have 
the largest single block of adherants in the testing 
community . 

5. Discrete point tests lend themselves particularly well 
to measuring formal English in an educational domain. 

Discrete point direct tests (such as the BINL) , were seen as a mixed blessing. 
On one hand, they involve, by definition, verbal interaction situations which are 
at least somewhat related to typical classroom interactions between student and 
teacher. On the other hand, however, they generally involve a higher level of 
training on the part of the tester and the scorer (particularly if they are the 
same person). Tlie interviewer needs to be skilled in eliciting speech from the 
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respondent in relatively unstructured situations • This becomes very difficult vjith 
young children, especially \^hen little time is available to establish rapport* 
Since the respondent's free responses must be analyzed for particular structures, 
vocabulary, etc.^ it is required that either the session must be tape-recorded and 
possibly even transcribed for later analysis' or txQO people must be involved in the 
testing an interviewer and a scorer. Either of these alternatives is unattrac- 
tive within the context of the present project vith its staff of 100 or more 
interviewers (calculated at one interviewer present per interview) and a very few 
weeks to collect the data and score the criterion instruments* Thus, the discrete 
point direct approach was not given high priority* 

Reduced redundancy tests were not prime candidates for two reasons: first, 
their validity as a global assessment of comprehension and speaking is somewhat 
controversial and, second, the dependence of these methods on respondents* reading 
and writing skills made them generally unacceptable. 

This left two approaches, the discrete point indirect approach which has al- 
ready been discussed, and the integrative, relatively more direct approach exem- 
plified by the FSI Oral Interview. As applied to the present project, an integra- 
tive direct assessment would be one where the interviewer sets up a situation which 
would "call out" some of the skills necessary for performing adequately in an 
English- language classroom. Although no great amount of detail is known about 
exactly what those skills are, they clearly involve receptive and oral expression 
and receptive skills. Thus, the general sort of situation which suggests itself is 
one in which the interviewer engages the respondent in conversation and requests 
information, a narration, or statements of opinion. On the basis of that verbal 
interaction, then, the interviewer would rate the respondent on one or more scales 
of English proficiency. The advantages of this sort of procedure include its being 
more directly related to classroom interactions than are indirect discrete point 
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tescs and quicker and easier to score than direct discrete point tests. Its chief 
disadvantage is that a good deal of intervie^>?er skill may be called for, both in 
gaining the proper rapport vith the respondent so as to obtain a representative 
saziple of the respondent's verbal behavior, and in retaining an appropriate de- 
gree of objectivit}^ in scoring to maintain reliability across a variety of social 
classes, ages, and ethnic groups. Clearly, the instructions given to the inter- 
vie^^er and his or her perception of this sort of task are crucial here. (An 
additional conplication is that interviewers are generally trained to do everything 
in the intervie^j strictly according to the manual both with respect to asking ques-- 
tions and recording responses. Thus, an activity such as this relatively unstructured 
one is often difficult for interviewers to do correctly.) 

Given this preliminary review and discussion of the general approaches to 
English proficiency. Chapter III will describe the specific instrument development 
activities engaged in to produce both possible ^lELP instruments and the criterion 
ceasures which were then employed in the field test described in Chapter IV. 
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Instrument Development and Refinement 

Activities related to the development and refinement of instr'--ents began on 
June 1, 1975 and ended on July 18 when RTI held its initial training session for 
field test site supeir/isors • Most of the work was done in San Francisco, a site 
chosen for its varied ethnic populations and the relatively large numbers of 
liziitad English speakers. In particular, initial testing of possible instruinents 
was done In the Latino, Chinese, and Filipino communities there^ An additional 
consideration in locating in San Francisco was that CAL already had many civic and 
acadeinic contacts in the area and thus could quickly recruit local personnel 
trained in linguistics and the social sciences to do the work. 

A brief narrative of the principal activities which took place during this 
period can be found in Appendrx: 3. 

Etoring this phase, the staff organized itself into a number of overlapping 
taazs, depending on the instrument to be developed and the ethnic group memberships 
of the team members. Since the time schedule was so short, instruments were con- 
structed and tested in households, the data analyzed, and revisions implemented 
in a ratter of da3rs at most and sometimes in a matter of hours. Statistical analy- 
ses such as standard item analyses and correlations among scales within and across 
the rhres ethnic groups were done by hand and by using the Stanford University 
Ccmpucation Center. \;h±le these quantitative results were available and played 
sc~e role in the development of the instruments , the largest factors in this phase 
or activities were the informal observations and intuitions of the staff and con- 
sultants who worked in San Francisco. As indicated in the Appendix, this group in- 
cluded both individuals with intimate knowledge of the ethnic groups and languages 
of interest and individuals with extensive experience in language testing, social 
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science research, and public education in San Francisco. It was oaly this unique 
blend of qualifications in the staff that made possible the production of so^e 
iastrur:encs in a five week period. 

1> DeveloT^r.ent of Discrete Point Tests 

The LESA - non-LESA distinction as legislatively defined appears to have three 
nain foci: ccmpreheasibn skills, speaking skills, and these as they are needed in 
instructional settings* Thus, it was desirable to address our discrete point tests 
to each of these* 

Tests of codT^reheosioa, Existing comprehension tests have as a conmoa property the 
following forriat: The interviewer proaounces a sentence or series of sentences and 
the respondeat nakes some sort of response from which it can be deduced that he 
"understood" the stimulus material. The response should be either non-verbal or 
niinimally verbal so as not to confound comprehension with production skills. A 
coomon response is for the respondent to point to the one of several pictures that 
best illustrates the stimulus utterance. Knowledge of vocabulary and word order 
are particularly easy to index in this way. Another sort of receptive test is to 
eive the respondent two sentences and he must indicate whether their neanings are 
the same or different. 

Tests of S-peaking . Many of these tests are available but nearly all of them tacitly 
assume that the respondent's comprehension skills are equal to or more advanced than 
his productive skills. Tnus , they typically require the respondent to both under- 
stand and speak in order to correctly answer an item. Since these are discrete 
point items, each is focused on a particular linguistic feature or structure. A 
typical forn:at is for the stimulus to include a sentence spoken by the interviewer, 
often a question, and usually referring to an object or picture which is present. 
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The respondent then must respond with an utterance that is both semantically 
appropriate and syntactically correct • The stimuli are designed so that the re- 
sponses from native speakers \^ill have a very high probability of containing the 
feature being tested. 

Tests of Coimunicatlon . Although both speaking and understanding of language are 
clearly called for in instructional settings, the ultimate requirement is that 
cozounication occurs between student and teacher. Tlius , it vas appropriate to 
look for a test that would involve some sort of overall coramtinication task. Sev- 
eral of these exist or are under development. They usually involve some sort of 
task-oriented interaction between interviewer and respondent or among two or more 
respondents. The task is structured so that it cannot be accomplished without 
information being transmitted verbally, and it is' easily determined when the solu- 
tion has been reached. An example would be a two-person task x-?here one has a set 
of blocks and the other a picture of how they are to be arranged. The object is 
for person 1 to duplicate the pattern in person 2's picture. While the relevance 
of such a task to everyday classroom communication requirements is arguable, it is 
a step toward forcing the respondent to use his linguistic skills in a communication 
context rather than in isolation. 

Comprehension , product ion, and communication skills were thus the three prin** 
cipal foci of the test development effort, although other alternatives were pursued 
to some extent as discussed below. 

There were several phases to the development activities. The first involved a 
nassive search of all available materials on English Language Proficiency. From 
this set a nunber of tests were found which met many of the criteria of the project. 
This set was further scrutinized, then reduced, edited, and amended for pilot 
tesiing in San Francisco. The next phase involved changing or eliminating items 
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.on individual tests based on pilot work with them in the Latino, Chinese, and 
Filipir.o cocntiunities in San Francisco, The LGRs * reactions to then also played 
an important role in this process (see Section III. 6). During this phase whole 
tests were dropped from the battery, What emerged from these operations were twc 
criterion batteries one for children and one for adults - which were then used 
in the field tests reported in Section IV, 

Below we will present only the development of instruments which eventually 
found their way into the final tests; however, appended to this report is an account 
of our work with all instruments which we seriously considered and developed to some 
extent but which we did not include in the final tests. (Appendis 4) 

The Oral Communication Test (OCT) . This test was developed by Upshur (1971) 
and was used in the present study to test communication skills of children and 
adults. It is an individually administered test for adults of ability to conmini^ 
cate in a foreign language, and had been used with respondents as young as 10 years 
old. The test contains thirty-six communication tasks. 
Upshur (1971) describes the tasks as follows: 

(1) The examinee is presented with four pictures differing significantly 
on one or tt^o conceptual dimensions. These (pictures) may represent, for 
example, a person performing four different 'actions', or the four conjunc- 
tive possibilities of a man with or without a hat walking up or down a 
staircase . 

(2) The examinee is instructed to provide a single sentence description 
to a visually remote audience of one picture which is randomly selected 
from the set, 

(3) The audience who is the examiner — makes a best guess as to which 
picture is being described. 

(4) The examinee's directed intentions (about which picture to indicate) 
are compared with the examiner's guesses (1971:438). 

The test yields two scores: The number of messages successfully conr:unicated , and 

time required for communication. 
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Respondents are first given oral instructions and four unscored, exarriple 
tasks. If they are unable to perform two of the last three examples, testing is 
not continued. Each subject is presented vith a key in the forn cf a list nuzib^red 
from 1 through 36. Fo Hoiking each of these numbers is a letter: A, B, C, or D. 
These letters refer to the one picture in the four picture set which the subject 
is to identify by his utterance. Different keys are used; in each key the pictures 
indicated have been randomly selected in order that the examiner caanot learn 
which pictures a subject is attempting to indicate. 

The stimulus pictures measure two and one-half by two and one-half inches. 
These are aligned horizontally on a card measuring six by twelve aad one -ha If 
inches. In the upper right corner of the card is the number of the test task: 
1-36. Below the four pictures are the letters A-D reading from left to right. 
The thirty-six test cards and four example cards are placed before the respondent 
in a stack face up. The respondent's key is placed facing him and closer to him 
than the picture cards . 

^'Tnen the respondent is ready to attempt an item he refers to his key and turns 
over the currently exposed card in order to reveal the item he will attempt to 
communicate. He is given three seconds to examine the set in order to see the 
significant differences among the four pictures. Then the examiner gives him a 
cue to respond, saying either, "Describe the correct -picture," or, *\7hat is the 
man doing?" As soon as the cue is given the examiner begins timing the respondent 
with a stop watch. Timing is stopped as soon as the respondent has completed his 
single sentence description, or at the end of twenty seconds if the examiner re- 
cords his guess of the keyed picture for each item according to the respondent's 
utterance. No attempt is made to evaluate linguistic aspects of a respondent's 
speech. 
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After the test session, the examiner compares the respondent's key with his o^m 
recorded guesses. The number of corresponding numbers is the respondent's message 
score. The total time used in responding to the thirty-six items is the time score. 

The foliating modifications to the test v?ere made during the San Francisco 
pilot work. No time limits v?ere set the subject could look at the stimulus for 
an unspecified length of time before he responded. He could take as long as he 
vanted to respond. This modification was made because it was felt that a time 
restriction might penalize Navajo speakers who reportedly have long latencies in 
conversations as a normal characteristic ♦ 

All communication tasks were arranged in a booklet. For each task an "X" 
was put below the stimulus to be described. There were four different sets of 
materials: all contained the same items but differed in terms of the specific 
picture in each item to be described. As mentioned before this was done so that 
the examiner would not become familiar with the stimuli and memorize the sequence 
of correct answers . 

Other amendments were also made as a redult of field experience. The number 
of communication tasks was eventually reduced from 36 to 15 and all pictures were 
redrawn to make them more realistic. Although time scores were taken, they were 
not used for the final analysis. Otherwise the scoring procedure was the same as 
that described by Upshur. 

The Adult Production Test (APT) was adapted from the Ilyin Oral Interview 
procedure (Ilyin, 1972). The test was developed to test an adult ESL speaker's 
oral proficiency in English. In the original procedure, the respondent is shown a 
picture and asked a question to elicit a specific grammatical structure. There 
were 50 items in the test. Each response could receive a maximum of 4 points; 
1 for information, 1 for word order, 1 for verb structure, and 1 for other gram- 
matical elements. 
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In the first phase of the San Francisco testing the follc^^ing nodif ications 
were made. The test was given to adults and children, and the instructions were 
simplified. Thirty of the original items were used. These items 

had been specified by Ilyin (personal communication) as being the most discrimi- 
nating. 

The instrument was further modified during the pilot activities. It was too 
difficult for children and thus only given to adults . The items were further re- 
duced to 16. All pictures were redrawn to make them more realistic. The scoring 
procedure was simplified. Each response could receive a maximum of two points: 
one point for correct information, and one additional point if the gramiaatical 
structure of the response was correct as well. Also, after failures on five con- 
secutive items, the test was discontinued for that respondent. 

The Adult Comprehension Test (ACT) was based on the items of the CELT (Upshur, 
et^ al, 1964). The CELT was developed to test English Proficiency in adult speakers 
of ESL. Our interest was in the Listening; section of the test which is composed 
of two parts. In part 1 the subject hears a question and then has to select from 
four written alternatives the best response. For example the respondent hears When 
are you going; to New York ? and then reads the following alternative answers: 

a) to visit my brother 

b) by plane 

c) next Friday 

d) I am 

He then marks the most appropriate one. There are 20 such items. Part 2 is com- 
posed of 20 items. Here the respondent hears a sentence such as George has just 
returned hone from vacation and then reads four alternative sentences: 

a) George is spending his vacation at home. 

b) George has just finished his vacation. 
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c) George is just about to begin his vacation^ 

d) George has decided not to take a vacation. 

He is asked to mark the sentence \^hich is closest in meaning to the op.e he has 
heard . 

The basic idea behind the test \^as intriguing even though the form had to be 
greatly changed because a paper and pencil test was undesirable. As nodified by 
CALjPart 1 required the examiner to ask a question. He then orally gave the 
respondent two different answers. The respondent had to indicate which one was 
best. In part 2 the examiner said tt^o sentences. The respondent was asked to 
indicate whether they were the same or different in terms of meaning. 

Since time pressures dictated a speedy start in testing and revising this 
instrument for use in the field test, the necessity of negotiating with the pub- 
lisher for permission to make modifications W2S circumvented by simply using the 
general logic and format of the items but entirely recreating the test ourselves 
with all new items. Even so, of course, many of the same language structures were 
tested as are tested in the CELT* 

There were 30 question and answer items and 43 sentence pairs* Both children 
and adults were given the test. By the end of the San Francisco phase the follow- 
ing modifications were made. 

a) The Quest ion -Answer section was totally eliminated* Examiners reported 
that the task was too difficult. One of the major reasons for this seems to be 
that there was no context for these questions. 

b) The task was too difficult for children. It was only given to adults. 

c) The final number of items was reduced from 43 to 10. The 10 surviving 
items were selected on the basis of having high part-whole correlations with the 
total score of the 43 items. The resulting instrument was called the ACT or 
Adult Comprehension Test. 
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MM-SEA-CAL, This test \^as developed by Joseph Mat luck and Betty Mace- 
Matluck (1975) under the auspices of the Seattle Public School Board and the Center 
for Applied Linguistics. It \^as developed to measure the child's ability to under- 
stand and produce distinctive characteristics of spoken English. It was originally 
intended for children in Kindergarten through Grade 4* CAL adapted sections of 
this test which were eventually used to measure English receptive and production 
aoilities in children. 

Part 1 of the original test had 27 items. For items 1-17, the examiner says 
a sentence and the child points to one of four pictures which best gives its meaning. 
In itens 18-27 the examiner gives a command (e.g.^ Stand ug) to which the child 
responds . 

In the pilot work, the commands were eliminated from this section because they 
were too easy and thus did not discriminate between good and poor proficiencies — 
only between poor and no proficiencies. Minor modifications were mde throughout 
the pilot-test to items 1-17, and the final instrument was composed of 12 items 
derived from the original ones. The pictures were redrawn to make them more real- 
istic and the number of alternatives in each item was reduced to three* As in 
the other tests described, administration was terminated after 5 consecutive fail- 
ures . 

Part 3~^' of the Mat-oEA-CAL is called "structured response" and is meant to 
test oral production. The task is very similar to the Ilyin Oral Interview de- 
scribed above. The respondent is shown a picture and asked a question about it. 
The question is so designed to elicit a specific grammatical structure from the 
subject. There v;ere 28 i^ems in the original MAT-SEA-CAL, each worth one point if 
the response was grarnmatically correct. 
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^' Part 2 is an irritation task. It was never considered in that an imitation pro- 
cedure -.--as built into the ETS test discussed in the appendix. Pvesults of that test 
indicated considerable difficulties in scoring an imitation test; thus, even when 
the ETS tost was dropped, the MAT-SEA-CAL imitation section was not considered* 



In the pilot work, the test was given to children up to 14 years much as 
described above ♦ The follo^>?ing modifications vjere incorporated into the final 
items . 

1) 20 of the 28 items were retained 

2) the pictures were redrawn 

3) the scoring procedure was changed. Each answer was given one point 
for correct information, and one additional point for being grarouati- 
cally correct in addition. 

To summarize, the following table shows which tests were used in the final 

battery, to whom they were given, anri what each was meant to measure. All tests 

were discrete point and indirect in their general approaches to the measurement of 

language proficiency* 



Name of Subtest 
Adults 

1. Adult Comprehension 

Test (ACT) 

2. Adult Production 

Test (APT) 

3. Oral Communication 

Test (OCT) 



Measures 



No. of Items 



Total 



Children 



1. MAT-SEA-CAL-I 

2 . MAT-SEA-CAL-II 

3. Oral Communication 

Test (OCT) 

Total 



Reception 



Production 



Communication 



Reception 
Production 

Communication 



10 
16 

15 

41 

12 
20 

15 
47 



Possible points 

10 

32 

15 
57 

12 

40 

15 
67 



The developing and final forms of these tests are reproduced in Appendices 
9 and 10 respectively. 
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2> Developmeat of the Direct Observation Rating Procedure (DORP) - 

There were two motivations for developing this measure: 

1. To serve as a criterion measure which would qualify as a direct 
measure of English proficiency based on face-to-face interaction and 
observation/ 

2. To provide a "back-up" MELP instrument in case none of the opinion- 
naire-type questions were satisfactorily predictive of the criterion 
measures . 

Since the constraints of the project dictated that it must be administered by an 
interviewer (ra :her than a teacher) in the household (rather than in a school), 
there were severe limitations on just how direct a measure the DORP could be. One 
way in which directness could be preserved was to develop the descriptors of the 
scale positions with the help of teachers rather than linguists or researchers. 
Teachers were also consulted in the formulation of the speech elicitation situation. 

Procedure: The development of the DORPs for children and adults were developed 
separately but in parallel. In both cases, several steps were involved. 

1. Elicitation and recording of free-speech, both conversation and narration 
from respondents of various ages, linguistic backgrounds, and English proficiency. 

2. Elicitation from teachers of ratings of the speech samples plus comments 
on the properties of the samples that determined their ratings. 

3. Compilation of these data into descriptions of a graduated scale of 
English proficiency. 

a. Elicitation of speech samples: In the course of data collection in- 
volved in refining other instruments, recordings were made of brief conversations 
between interviewer and respondent. The respondents were asked a range of open- 

Special thanks go to Amador Bustos, Carolyn Karelitz and William Sinclair for 
their contributions to the development of this measure. 



ended questions such as *^^hat is the most exciting thing that ever happened to 
you?** 'Vhat is your favorite TV program?*' "Tell me about your best friend." 
etc. Respondents \^eice then shown a book of photographs, asked to describe 
several photos, and asked to tell what they thought was happening in each picture. 
Such data were collected from 15 children and 8 adults. The children ranged 
in age from 6 to 13 and included Latinos, Chinese, and Filipinos. The ages 
of the adults ranged from 18 to 70 with all three ethnic groups represented. 
The speech samples were then copied ont;o two master tapes, one for children and 
one for adults. 

b. Judgments of speech samples by teachers: Two sets of teachers vere employed 
to judge the speech samples. The 24 teachers judging the children's tape were 
all certified, employed elementary school teachers in the Bay area. All had 
had experience with children whose native language was not English. Fourteen 
teachers judged the adult samples. They were all actively teaching in adult 
education programs in the Bay area. All teachers made their ratings in groups 
of from six to 14 people. The procedure was as follows: 

1. The need to develop a direct observation scale was explained. 

2. Each teacher was provided with a form on which to rate each sample 
and write coments about it. (See figure 1). They were to use a seven-step 
rating scale. 

3. Before hearing each sample the teachers were told the age of the person 
whose speech was to be heard. 

4. The first tv/o samples to be heard were the least proficient and most 
proficient of the group as judged by the project staff. The teachers were told 
that they were to rate them as 1 and 7 respectively. 

5. As each sample was played, teachers were asked to make their ratings 
and then to write as completely as possible the reasons why they r^.tod the 
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speaker as they did, paying special attention to specific features that they 
had noted in their experience as being predictive of academic success or failure 
in a non -native English speaker. 

6. After the samples had all been played, a group discussion was initiated 
about language requirements for success in the classroom. 

7. The session lasted two to three hours overall and each teacher was 
paid $25 for participating in it. 

c. Analysis of responses: The data analysis was essentially the same for adults 
and children. First, the mean rating and its associated standard deviation were 
computed for each sample of speech. Speech samples eliciting widely divergent 
ratings from the teachers (as evidenced by high standard deviations) were elim- 
inated from further consideration. A list was then made of all the teachers' 
descriptive comments for the samples remaining at each step of the scale and a 
content analysis was made of the comments about the samples in each step. The 
comments were categorized with respect to the follo^^ing aspects of speech behavior 

1. Fluency: hesitancy or quickness of response, need for prompting. 

2. Comprehension: comprehension of questions and instructions, of sequences 
of events, ability to draw inferences. 

3. Sentence Structure: Complexity of sentences, word order, use of prepo- 
sitions, articles, and verb tenses, variety of sentence types. 

4. Vocabulary: Use of adjectives, slang, words from the native language, 
and colloquialisms . 

5. Pronunciation: Interference, intonation, accent. 

Next, a seven column (mean rating positions) by five vckq (dimensions of lan- 
guage evaluation) matrix was constructed. Each cell contained all comments about 
all samples occupying that particular scale position dealing with that particular 
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dir.ea3ion of evaluation. Separate matrices were constructed for adults and chil- 
dren. Inspection of those matrices immediately indicated that there was no 
apparent difference between the descriptors of positions 2 and 3 on one hand and 
5 and o on the other. Thus, the scale was collapsed to a 5 point scale. Finally, 
the most frequent coTn::iients in each of the cells were combined into several 
sentences emphasizing the distinctions between neighboring cells. The choice 
was then made to eliminate the five separate dimensions from the final DORP 
scale since the instrument had ultimately to yield a single rating for each 
respondent. Descriptions of the five global scale positions were synthesized 
from the coluzins of the matrix. Those descriptions were the ones provided to 
the interviewers and are reproduced in Appendrx 12* 

d. The Elicitation Situation: The final aspect of the DORP to be defined 
was the elicitation of the speech sample. This was a significant problem because 
of the requirement that the situations be at least somewhat standardized over 
the entire range of ages and ethnic groups. The general problem of obtaining 
useful spontaneous language samples is well known by sociolinguists^and there 
are apparently no easy solutions (cf. Wolfram and Fasold, 1974) even under the 
best of conditions. It amounted to finding situations in which people with very 
different backgrounds and interests would all talk with equal ease and volubility. 
Unfortunately > even if that objective were achievable, we had no time to test 
various procedures. Thus, the solution adopted was merely to have the interviewers 
ask three open ended questions of each respondent with further instructions to 
add to those questions in any way that would be likely to get the respondent 
talking. The questions were picked from among those that seemed most effective 
'.'hen eliciting the speech samples used in the development of the rating scale. 
They are included belc..?. 
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ADULT QUESTIONS 

1. ''Could you take a second and think of the one person who has made a big 
impression or, you, and tell me, as much as you can about that person, (pause) 
I'll just liscen, and you tell me. Take your time.'' 

2. "N'o\^ if you will, I'd like you to think back to one of the most exciting 
experiences in your life. Tell me as much as you can about that experience." 

3. "Now a final question. Take a second to think about this question. If 
you could do anything you wanted to do today, what do you think you might do? 
Tell lae as laucb as you can about what you might do, " 

CHILD QUESTIONS 

1. "Could you take a second and think of your best friend, and tell ne as much 
as you can about that person. (pause) I'll just listen and you tell me. Take 
your time. " 

2. **Xow if you will, I'd like you to think back to one of the most exciting 
places that you've been to. Tell me as much as you can about that place." 

3. "Now a final question. Take a second to think about this question. If 
you could do anything you wanted to do today, what do you think you might do? 
Tell ne as nuch as you can about what you might do. " 
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Figure 1: torzi used by teachers to rate and comment on speech sanples 

SAMPLE -v 

Please listen carefully and make any notes in the space provided below: 
NOTES: 



(If inore space needed, please write on back of sheet.) 

Please rate the sample on the basis of the child's likelihood of succeeding 
in (or benefiting froia) a monolingual English class (circle one). 

1 2 3 4 5 6 7 

(lease likely) (most likely) 

Give as many reasons as you can for rating this sample the way you did: 
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3, The Monitoring System 

In CAL's proposal to NCES , the need vas expressed to develop an objective 
behavior monitoring system to obtain data on the nature of the interactions be- 
tt>7een interviewer and respondent during the asking and answering of MELP questions* 
This \<jas seen to be particularly important because of the possible cultural and 
linguistic differences between monolingual English speaking interviewers and poten- 
tial LESA individuals. (It vas not at all clear at the time in what numbers Census 
would be able to hire interviewers who were members of the ethnic-linguistic groups 
involved.) CAL planned to have its staff members monitor the RTI interviews to 
collect both objective and impressionistic data on strengths and weaknesses of the 
questionnaire and procedure. Without exception, these monitors were members of 
the research staff who had developed the MELP questions, the test, and the DORP 
in San Francisco and had conducted many such interviews themselves, thus they were 
well-acquainted with the objectives of the project and the intended uses of the 
ins truments • 

In mid-June, Dr. Jeanne Freeman was given the assignment of developing an 
objective behavior coding system to monitor the interaction in interviews* The 
remainder of this section is her report of the development activities • 

The development work began with an extensive review of the literature on 
interaction analysis systems (e.g., Simon and Boyer, 1967; Rosenshine and Furst, 
1971; and Dunkin and Biddle, 1974) and the literature on non-verbal communication 
(e.g., Mehrabian, 1972). This review of the literature, coupled with con- 

sultation with Dr* Jere Brophy of the University of Texas at Austin led to the 
selection and adaptation of verbal and non-verbal categories from already existing 
systems and the development of categories appropriate for this specific study. 

A preliminary set of categories was developed for verbal and non-verbal be- 
haviors. The non-verbal categories reflected major areas: proxemics (distance), 
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haptics (touching), kinesics (body movements) , oculesics (eye behaviors), m 
addition, verbal categories were developed to differentiate and record various 
phases of the interview. This initial list of behavioral categories ..as sub- 
mitted to the development staff („ho represented various ethnic groups) in 
San Francisco. The staff rated the categories in terms of appropriateness for 
the different ethnic-linguistic groups. Although there were several categories 
that were questionable, the first draft of the monitor's interaction analysis 
system was developed, including definitions and examples of the categories. 

rnis first system divided the interview into three sections: the intro- 
ductory/orientation phase, the questioning/answering phase, and the closing phase. 
Each phase contained categories specific to that phase (i.e., in the introductory/ 
orientation phase, specific verbal and non-verbal greeting behaviors; in the clos- 
ing phase, specific verbal and non-verbal leave taking behaviors). However, each 
P phase was also coded according to a single set of global rating scales developed 

to assess high inference behaviors, such as responsiveness and tension. 

The first set of categories for the introductory/orientation phase of the 
interview included verbal greeting behaviors, such as exchange of pleasantries 
and receptive-unreceptive consents, and non-verbal behaviors, such as distance 
from interviewer, touching behaviors, and facing the interviewer. The global 
rating scale coded at the end of this phase and at the end of each subsequent phase 
included five-point rating scales representing general behaviors (pleasant- 
ur-pleasant, res pons ive-unres pons ive , tense -relaxed, tolerant-intolerant, open- 
withdrawn, formal -informal) . 

The categories for the question/answer sequence, in which the interviewer 
asked the census-type questions and the criterion measures, included four five- 
^ point rating scales (willingness to respond, nervous -calm, brief -detailed , positive- 

; negative) to be completed for each item. Toward the close of the question-answer 
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sequence, the monitor raLed the respondent according to the occurrence of specific 
noa-verbal behaviors, (e.g., facial expressions, facing the intervie^i^er , looking 
to^;ard the interviewer, leaning tovard the interviewer, stiff posture, tense 
hand/leg movements). 

The categories for the closing phase of the interview included verbal leave- 
taking behaviors, such as exchanges of personal information and coin-ents to main- 
tain or close the interaction, and non-verbal behaviors, such as walking the per- 
son to the door, distance from the interviewer, and touching behaviors. After 
recording these behaviors, the monitor would code the respondent's behavior 
^ according to the same global rating scales; however, in this phase, the monitor 
recorded changes in global behaviors. For example, the monitor would check plea- 
sant-unpleasant for one of the following: a mixed pleasant/unpleasant response > 
a change from pleasant to unpleasant, a change from unpleasant to pleasant, or 
no change. Therefore, the ^monitor could infer general characteristics of the re- 
spondents' behavior and record the general pattern of the entire interview for each 
global category. 

The first version underwent modification with the help of three CAL research 
assistants^-- in San Francisco and resulted in a considerably simplified category 
system: (1) the specific verbal and non-verbal greeting and leave taking behavior 
categories in phases 1 and 3 were eliminated, and the list of non-verbal behaviors 
in phase 2 were substituted. (2) the specific non-verbal categories and the glo- 
bal rating scales were collapsed somewhat. For example, rather than having separate 
categories for nervous hand movements, nervous arm m.ovem.ents, nervous leg rr.ovements , 
or nervous foot movements, these were collapsed into a category nervous hand/arm / 
leg/ foot movements . Also, since eye contact was so variable among ethnic groups, 

Evangeline Kami tsuka, Michael SamVargas , and Richard Chambers 
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the various eye contact behaviors were deleted and incorporated iato a category 
lookincr tabard the interviewer * Pleasant and friendly were collapsed into one 
category. These changes and modifications constituted the second draft of the 
coding system. 

After arriving at the set: of categories for the second version of the manual, 
the research assistants refined the* categories in the system by elaborating the 
definitions and examples and by reducing the five-point scales to three-point 
scales. During this phase of the development, the objectives of the monitoring 
system were reassessed and to some extent reformulated. The objective of assess- 
ing the validity of the respondent's answer remained; however, the objective of 
assessing affective verbal and non-verbal interactions was considered of secondary 
importance; therefore, the categories were redesigned to focus strictly on the 
respondent's answers to the MELP questionnaire and whether the interviewer achieved 
the objective of the question (i.e obtained the information called for by the ques- 
tion). The second phase of the interview, the question/answer phase became the 
basic framework for the revised version in which several categories were coded for 
each question/answer unit. 

The third version of the monitoring system involved structuring and elabora- 
ting the question/answer phase in which each question answer unit would be coded 
according to several categories. In the question/answer sequence, response and 
detail remained as categories. In addition, several categories were added ( other 
answers , relevant answer , seek clarification , rephrase and achieve objective) ♦ 
The framework for these categories consisted of four three-point rating scales 
( response , detail > nervous , and attentive) and five checklist categories ( relevant 
answer , rephrase , seek clarification , another answer , and achieve objective) . 
After developing definitions and coding procedures for these categories, the 
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staff practiced coding x^ith this system. Preliminary testing led to further 
changes : (1) deletion of the high inference categories ( nervous and attentive) 
(2) changing all categories except detail to checklist form, and (3) including 
cases in vhich the interviewer or respondent uses his native language. Also, 
to facilitate the actual coding, the categories were logically structured into 
the following superordinate categories: 

Problems of the respondent in making a response to a question 
Does not respond 

Answers with information irrelevant to the question 

Another person answers the question 
— Respondent seeks clarification 

Respondent uses language other than English 

Interviewer Behavior 

Interviewer rephrases the question with or without an 
explicit request from the respondent to do so. 

Interviewer uses language other than English 

General 

The objective of the question appears to have been achieved 

Amount of detail of information given by respondent in answering 
the question (insufficient, sufficient but minimal, more than 
sufficient) 

These categories were selected to code only what the interviewer or respondent 
said in English; questions or answers in translation were coded only as uses other 
language_. In order to standardize the monitoring, this procedure was required due 
to the variability of the monitors, some of whom did not speak the language of 
the ethnic-linguistic groups . 

The final categories were incorporated into coding sheets designed to identify 
each census question by a number and code word, so the monitor could readily iden- 
tify the answers to each of the questions. For example, the monitoring form corres- 
ponding to MELP question ^-^1 (date of birth) was: 
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f>l Date 

No response 

Irrelevant answer 

Another answer 

Seeks clarification . 

Rephrase 

Injt . other language 

Resp > other language 

Achieve objective 

Detail 12 3 

In addition to coding each quest ion -answer unit for the census questions, 
monitors recorded comments about specific unusual occurrences, such as the respon-- 
dent not completing the interview, the respondent having auditory problems, the 
respondent having difficulty reading flash cards, or the respondent being resistant 
or inattentive* Also, monitors recorded any other circumstances that may have 
affected the respondent's performance or would affect interpretation of the data. 

Preliminary coding to establish inter judge reliability was done by Freeman, 
Kamitsuka, SamVargas , and Chambers. Major disagreements on problems of definition 
were resolved before establishing interjudge reliability on each category. Reli- 
ability data for each category were based on the percentage derived from the formula 
unanimous agreements among judges (4) 

occurrences of the category 

For all categories, 807o agreement or above was established. It was felt these 
percentages were sufficiently high to justify use of the system for the field test* 

A final draft of the category system was developed for use in training the 
other staff members to use the system. Training included general overview and dis- 
cussion of the categories and practice coding using videotapes. Results of the 
reliability assessments were fed back to the participant coders and discussed. 
Training was completed before the staff left San Francisco for the various field 
test sites. A copy of the manual is appended to this report. (Appendix 13) 
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Among the published accounts of using questionnaires to collect data on lan- 
guage proficiency and use patterns (e .g.^ Lieberson , 1966; Mackey, 1966; Kelly, 1969; 
Harrison, Prator, and Tucker, 1975; and Conmittee on Irish Language Attitudes Re- 
search, 1975)j the one relied on most heavily in the present project was that by 
Fishman, Cooper, and Ma (1971). ( Both Fishnan and Cooper served as occasional 
consultants on the project.) Generally, this literature indicated that individ- 
uals can rate their own language proficiency fairly accurately (as compared 
with their performance on tests), and that both their current use of the language 
and their educational history involving the language correlate quite highly \^ith 
test scores as well. Thus, the initial foci of the MELP questions were five-fold: 

A. Screening Questions. In chapter I of this report, the need for a set of 
screening questions was discussed. They were to define the pool of potential LESA 
individuals as characterized by PL 93-380. In particular, they were to determine: 

a. Place c*^ birth 

b. Usual language spoken by the individual 

c* Usual language spoken by the individual's household 
d. Parents' usual language (for children) 

B* Self-rating Questions. These were questions asking the respondent to 
directly evaluate his own ability to speak and understand English. Respondents were 
also asked to rate their proficiency in their non-English language on the possibil- 
ity that proficiency in one language might be inversely related to proficiency in 
the other* Proxy respondents were asked to rate another person in their household. 

C. Language Use Questions. Assuming that proficiency in a language is directly 
related to the extent and variety of its use in various situations, a number of 
questions were tested which explored the respondent's usual language in the home, 
at school, at work, and with peers. 
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D. Educational History, Since the LESA concept is defined relative to edu- 
cational settings, questions \>?ere created dealing with: 

(1) number of years of formal education completed 

(2) country in which the education was received 

(3) number of years in which English was the principal language 

of instruction 

(4) whether the individual had ever been informed by a school official 

that his English was insufficient for educational purposes 

(5) whether the individual had ever been held back in school (be^ 

cause of deficiency in English) 

(6) whether he or she had ever participated in a bilingual program 

(7) whether he or she had been enrolled in school in the last year 

E. Mass Media Questions : Several questions relative to the respondent's use 
of various English language mass media were explored on the hypothesis that the 
regular use of English mass media would imply proficiency in English. The converse, 
of course, would not be a reasonable implication (i.e., that one not using mass 
media was not proficient in the langixage) . 

Procedure ; The procedure used in developing and testing these questions 

was as follows: Dr. Terry Webb and Dr. Alberto Rey were principally involved in 
producing drafts of the MELP questions. They were closely guided by Leslie Sil- 
verman of NCES vjhile he was on site. The questions went through so many editions 
that it is not useful to try to trace their evolutions in detail here; however, 
several sequential versions of the questionnaire are appended to this report. 
Generally, the procedure was as follows: 

1. An edition of the questionnaire was produced and distributed to 
the various teams developing the tests. 

2. They would use the questionnaire for one or two days of inter- 
viewing in the Latino, Chinese, and Filipino communities. 

3. A meeting of the entire staff would be held in the late afternoon 
and the experience with each question in each ethnic group would be 
discussed in detail. 

4. Revisions would be made over night and a new version typed, re- 
produced and distributed to the teams by noon the next day. 

5. Etc. 
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Since the questions in the NCES ^^Survey of Languages" had already gone to 
press as part of the July, 1975 edition of the Current Population Survey, they 
vere generally included in a form unchanged from the CPS. This would enable some 
comparisons of their adequacies relative to soae created by the CAL staff which 
covered approximately the same topics • 

Finally, on July 12, the then current version of the questionnaire was repro^ 
duced for distribution at the July 13*- 14 meeting of the LGRs . That edition is 
appended to this report. (Appendix 14) 
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5. The Language Group Representatives 

Selection , The nature and purpose of the LGR advisory committees dernanded that 
they be composed of individuals who were members of or who had worked with the 
various linguistically different groups in the United States. Emphasis was placed 
to identify and select individxxals involved in coramuaity work on the political, 
social and/or religious levels. Similarly, attention was placed on the selection 
of participants who had had a chance to work in areas where the concept of education 
had been actively discussed or been a major goal. 

Due to the linguistically heterogenequs nature of the American populace, GAL 
felt that a number of language groups had to be represented. Gonsequently , five 
major language groups were identified with subgroups within each. The five major 
language groups were Spanish, Ghinese, East Asian/Pacific, Native American, and 
European/Near Eastern. Equally important was that areas of the country where the 
language groups were found should also be represented- -the rationale being that a 
language group in one part of the country did not necessarily have the same back- 
ground, goals, desires, needs and degree of English language proficiency as a sim- 
ilar group in another part of the country. For example, Ghicanos in Texas, tend to 
be located more in rural areas and have perhaps more ties to the Spanish language a 
and culture than their counterparts in the Midwest. For this reason, a relatively 
large language advisory committee was assembled. Gonsequently, advisors were drawn 
from (1) specific dialects /languages within each of these language groups and from 
(2) various areas of the country where these languages /dialects were represented. 

The suggested plan called for a representative group of Spanish-speaking Mexi- 
can Americans from the West Goast, Texas, and the Mid West; Puerto Ricans from the 
East Coast and Ghicago; and another group from the Cuban, Dominican, and Central 
Anerican communities. Organizations like L.U.L.A.C., National Task Force de la 
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Raza, El Congreso, Mexican American Council on Education, United Migrant League, 
ASPIRA, Puerto Rican Legal Defense Fund, and other Spanish speaking community- 
based and/or -minded organizations served as sources or contacts for this advisory 
board. 

In addition, an advisory board was selected to incorporate the Chinese per- 
spective. Representatives from West Coast, East Coast and Chicago community organ- 
izations \^ere invited to assist in the tasks for this board. Likewise, representa- 
tives from the East Asian/pacific language groups were identified and involved. The 
Korean, Vietnamese, Japanese, Filipino and Samoan communities were canvassed for 
advisory board representation. 

The Native American advisory board was made up of a representative group of 
Navajo, Sioux, Mikasuki/Seminole , Papago, and Eskimo, as well as representation 
from the Northwestern tribes- Organizations like the National Congress of American 
Indians, National Indian Education Association, United Sioux Tribes, United South- 
eastern Tribes, United Indians of All Tribes Foundation, and the Navajo Division 
of Education were identified as sources or contacts for this board. 

Finally, the European/Near Eastern perspective was incorporated by including 
representatives from the French (New England, Louisiana, Haitian), Italian (East 
Coast), Portuguese (New England), Greek, Polish (Chicago), Serbo-Croatian and 
Arabic (Detroit) language communities* 

The above national groups reflected an approximate total of 45 individuals 
who were invited to form the advisory committees. The geographical areas of con- 
centration which were identified were in no way fixed; rather, these were areas 
which, based on current census data, seemed to have a significant number of the 
aforementioned population groups. 
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6, Role in Instrument Deve logmen t and Use 



LGR Meetino; #1 > The first LGRs were scheduled for meetings June 10 and 11. Sub- 
sequent groups came to GAL offices in Arlington, Virginia every two days until 
June 18-19 (Spanish, Native American, Chinese, Asian/Pacific and European). 

The morning of the first day was spent introducing the LG?vS to CAL and its 
work and KGES and its work. CAL's involvement in the NGES project was outlined 
carefully. Moreover, the project was placed in perspective relative to current 
legislated mandates. Likewise, the project was discussed at length to insure that 
the LGR's understood what the consequent MELP would do and not do, and the purposes 
of its use. 

The afternoon session was devoted to several points of discussion. First, 
the concept "instructional/educational difficulty* (quoted from current legislation 
regarding bilingual education) was introduced, and attempts were made to arrive at 
a group definition. Then, several reports were given which focused on past and 
current language assessment in the represented LGR communities . 

The second day was devoted to a review of current research regarding theory 
and practice in language testing. This was supplemented by a review of effective 
and tested sociolinguis t ic field methods. Discussion focused on the consequences 
of '"mistakes" in data gathering. 

Criterion and candidate MELP measures for language assessment were then intro- 
duced and discussed. It was pointed out that project items or measures could not 
be of a criterion type, rather, they had to follow ''census type** questions. Never- 
theless LGRs were asked to consider the initial battery of criterion measures and 
assess them for their face validity. Finally, the LGRs were given an opportunity 
to make recommendations regarding potential cultural and linguistic biases in the 

III - 28 

n / 



>2LP format and items (for those proposed for the initial field testing). Like- 
wise, recommendations \gere accepted regarding current, sensitive guidelines to be 
follo^^ed in order to facilitate all data collection. 

Every LGR meeting follo^^ed basically the same agenda and content. Gil Garcia, 
Leann Parker, Dr. William Leap, Diana Riehl, and Dr. Roger Shuy collaborated in 
these efforts. $ee Appendix 15 for LGR reports) 

LGR Meeting #2 . Although the LGRs made preliminary comments about the kinds of 
instruments that would be appropriate for their respective groups (both MELP ques- 
tions and criterion instruments) during their initial meetings in June, their main 
opportunity for concrete input to the project came during Meeting #2 in San Fran- 
cisco on July 13-14. Upon arrival they were given packets containing all of the 
instruments developed in the pilot activities (discussed in the preceding sections 
of this chapter). The first morning was spent in a general briefing by Walter Stolz 
on the activities to date, the design of the field test, and a review of the 
general objectives of the MELP project and the SIE. Then Earl Gerson of the Bureau 
of the Census briefed the group on the general sampling plan to be used in the SIE. 

In the afternoon, the CAL staff acquainted the LGRs with the instruments and 
the general interviewing procedures to be used in the field test. This was done 
by role-playing interviews using the LGRs as respondents. Video tapes of several 
interviews made in the last days of tie pilot work were also shown. 

During the remainder of the conference intensive discussions were held within 
each area group of LGRs relative to specific aspects of the instruments which 
should be modified or eliminated. Each representative was asked to submit an 
individual critique of all materials; however, each group also prepared a single 
report to be presented to the conference as a whole. These reports were presented 
and discussed on the last afternoon (see Appendix 15). As can readily be seen 
they range from comments on individual items to critiques of the government's 
philosophy to^^ard bilingualism and bilingual education. 



During the ten days between the LGR meeting and the beginning of the field 
testing in Miani and El Paso, both MELP questions and criterion instruments under- 
went considerable change. The MELP questions were revised in group session by 
Stolz, Webb, and Troike of GAL, Horvitz and Weeks of RTI, and Dr. Dorothy Waggoner 
of NCES . The final field test questions are reproduced as Figure 1 in Chapter V 
of this report. The tests were revised by Strick in cooperation with the RJI 
graphics department. They are appended to the report.^' Some specific changes in 
the instruments stemming from the LGR's input were: 
1. The MSLP Questions 

a. Some questions were included to probe the reespondent 's 
knowledge of his first language as well as his knowledge of 
English (e.g., questions 9, 10, 11, 15). 

b. On questions calling for a proficiency rating, the nega- 
tive connotations of the lower steps were removed. 

c. Question 4 was changed in accord with a suggestion from 
' the Chinese group. 

d. Questions were asked separately about newspapers, maga- 
zines and books. 

e. A question about the language used at work was included, 
as well as some questions about type of work* 

f . Several questions were removed which seemed to have little 
to do with English proficiency. 

2* The Adult Production Test (Illyin) 

a. All pictures were redrawn to make them look more 
professional. 

b. The beach scene was eliminated, and a scene in a 
park was substituted. 

3. The >tat-Sea-Cal 

a. All pictures were redrawn 

b. An item involving a monkey climbing a tree was elin- 

rinated, and another item was substituted ("It's on the 
corner") ♦ 
* See Appendix 9 
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4. The OCT 

^ a. Stick figures were redrawn more realistically. 

b. The adainistration procedure was simplified. 

5. The ACT 

a. An additional example was incorporated into the 
instructions . 

LGR suggestions about interviewing personnel were followed by hiring approx- 
imately one-half of the interviewers in each site from the ethnic group being 
surveyed. Also, a more thorough orientation-training program was carried out for 
each site lasting three days instead of two as originally planned. In training 
interviewers for the Navajo site, Dr. Robert Young from the University of New 
Mexico, was brought in for t^^o days to provide a general orientation to Navajo 
culture . 

A concern about speed of responding to the tests was expressed by the Native 
^ Americans in particular. They thought that many Navajos may require more than the 

usual tise li3it if 10 or 20 seconds per item to respond with the correct answer. 
Thus, the interviewers were instructed to allow as much time as the respondent 
needed to give an answer. 

Site Visits by LGR3 . Several LGRs monitored the field test activities in the various 
sites. They traveled with one or more interviewers on their rounds and then made 
a report to the RTI supervisor and the CAL monitors. Suggestions for changes in 
procedure were referred to RTl's and CAL's central offices. LGR visitations in- 



eluded : 

Miani 



Arizona 



EKLC 



WiUy Gort Dillon Platero 

G. Kousoulas Fi^^l ^^^^ila 

San Francisco 

Ling Chi Wang 

Danilo Begonia _ 



No LGRs visited EL Paso because the vork had ended there before a schedule 
could be set up. Dr. Robert Young spent a day monitoring interviews in Arizona as 
an expert in Navajo culture. Each of the above LGRs reported back to their re- 
spective groups at LGR Meeting #3. (See Appendix 15 Reports.) 

Meeting #3 . The third LGR meeting vas held in Arlington on September 3-4, 1975. 
The main purpose of the meeting vas to brief the representatives on the field test 
procedures and preliminary results and to obtain general suggestions vith respect 
to analyses and interpretations of the data. The proceedings of that meeting are 
appended to this report. 

At the time the meeting was held, virtually complete data from Miami and El 
Paso were in the computer; however, only about one -third of the data from the other 
two sites had been processed into computerized form. Using the data available, 
frequency distributions and cross tabulations of MELP questions vs. test scores were 
constructed and distributed to the representatives. Stolz explained this material 
and discussion both in the plenary session and in groups ensued about how these 
results would be used to produce a MELP instrument and how that instrximent would 
be used to categorize people as either LESA or non-LESA. Summaries of these dis- 
cussions may be found in the proceedings. (See Appendix 15 Reports.) 
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IV. Field Testing the Instruments 

1. The Basic Design 

A principal step in the development of any instrument is the field testing 
phase. In a field test, the instrument is used in a context as close as possible 
to that in which it will eventually be employed in the survey proper, but additional 
data are also collected which allow for an evaluation of the trial instrument's 
performance. The most important evaluation which could be made in a case such as 
the present one is concurrent validity, and that was in fact the primary objective 
here. Concurrent validity was evaluated by correlating the items in the trial MELP 
instrument with several "criterion" measures of English proficiency. The develop- 
ment of two such instruments, the test and the DORP, have already been described 
in detail. The obvious way of obtaining correlations of MELP items and criterion 
measures is simply to collect all measures in a single interview and then compute 
correlations for all possible pairs of these variables. This was what was done 
with the MELP items and the test and DORP using a concurrent measurement validation 
design. 

When the criterion variable is not continuous but rather categorically defined, 
^ l^nown groups validation design is possible. In this design, respondents are 
chosen for participation in the study on the basis of their having been identified 
as belonging to one or another category of the criterion variable before the field 
test instrument (the MELP) is administered. A kno^^n groups design was possible in 
the present study because school systems serving populations that include consider- 
able numbers of children with native languages other than English screen such 
students for participation in special English-as -a-second~language or bilingual 
education curricula. Such screening procedures constitute local operational defini- 
tions of the concepts LESA and non-LESA in the sense that "passing" such a screening 
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procedure is taken by the school as evidence that the child can succeed in a stan- 
dard monolingual English instructional environment, i^e.^he is not LESA. Conversely, 
if the results of the screening procedure suggest the advisability of enrolling 
the student in special programs, this is equivalent to indicating that the student 
might encounter some "instructional difficulty" in the regular curriculum, i.e.; 
he is LESA* To the extent that such screening procedures are well-constructed for 
their purpose they produce appropriate known groups against which the MELP can be 
validated. They are particularly valuable in that they provide a non-arbitrary 
cutting point between LESAs and non-LESAs on the continutjm of English proficiency 

non-arbitrary because the cutting point is implicitly referenced against the 
school 's curriculum. 

The disadvantages of using the results of such screening procedures as criteria 
in our study revolve around the fact that they are different from school district 
to school district and perhaps from school to school. For example, some districts 
rely on interviews by specialists ,. others use standardized testing. Still others 
arbitrarily place the child in a regular classroom and then ask the teacher to refer 
him or her to special programs as the need arises. Some districts focus only on 
English proficiency, others take into account proficiency in the home language as 
well. Of course, the labels attached to the results of the screening are also 
various. They include references to "English-language limitation", to ''English 
independence", to "language dominance", etc. 

Beyond the formal definitions of the screening procedures, there is the actual 
practice of them which can be of concern to a researcher. An external observer 
can only guess at the informal factors that might be operating to affect the screen- 
ing processes. Are the bilingual services badly overcrowded? This could lead to 
lowering the implicit cutting point between LESA and non-LESA so as to provide 
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justification for placing more children in regular classrooms. Are the schools 
currently receiving funds on the basis of how many children need special services? 
That could lead to the opposite tendency — screening procedures which would 
demand a high level of proficiency for a non-LESA classification. Does the faculty 
posit a dominant view of • the mental capacities of a given ethnic group? And so 
on. It is virtually impossible to evaluate the extent to which such factors play 
a role in the way a given screening procedure is actually operated. What is 
clear, however, is that we can expect each school district to have its own unique 
screening procedure. Not only can we expect the cutting point betvjeen LESA and 
non-LESA to be variously placed in different school systems, but we can expect 
the continuum of English proficiency itself to be defined in various ways in the 
different locations. Thus, it would not be at all surprising to have the rela- 
tionship between the screening procedures and our test and DORP be noticeably dif- 
ferent in different locations. 

What can be said about which school screening procedures is "better" than 
another? The research literature is not useful on this issue because there is no 
absolute scale or standard of English proficiency against which to compare them. 
The strategy adopted here was to ask various state education agencies to recommend 
local districts that had exemplary screening programs relative to our purposes. 
Then the local school districts were contacted directly and asked to participate 
in the study. Their participation was to consist of providing NCES with "the names 
and addresses of up to 500 children who have been screened, about half of whom have 
been determined to need special programs and half of whom have been determined not 
to need them" (from a letter to the superintendents of various school districts 
from NCES) • 

This method of obtaining samples differed markedly from the sampling meth- 
odology originally proposed by RTI and CAL in their proposals to NCES. 
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Those proposals suggested an informal cluster sampling procedure wherein inter- 
viewers would simply canvas neighborhoods known to contain high concentrations of 
the ethnic groups of interest • Screening questions would be asked upon first 
.contact with a member of a household establishing the ethnic and linguistic back- 
grounds of the persons in the household. The interview would be continued, then, 
only for households that met certain screening conditions. During the course of 
interviews in the households of interest, permission would be sought to obtain 
information from the schools about the children in the household. After lengthy 
discussion during the week of June 16, it was decided that beginning with the 
schools and asking them to provide list samples was more efficient and more 
directly targeted on the objectives of the field test. It was also decided that 
NCES would make the contacts with the state and local education agencies. 

Sampling in Different Age Ranges . Fisher *s design specifications indicate that 
individuals of all ages were of interest to the Congress but that there was special 
interest in ages 5 to 17. However, NCES learned that screening programs and 
special curricula for secondary school students were largely non-existant or under- 
developed in most schools. The implicit philosophy seemed to be that helping 

the youngest children was most crucial and that older students either 
already knew a good deal of English or would learn it quickly given a minimum of 
assistance. As a result of this situation it was decided to limit the sampling 
of "children" to ages 5-13. This also coincided with the definition of "child" 
that was to be used in all other parts of the SIE questionnaire (i.e. the income 
and health-welfare sections); that is, in the SIE there were two questionnaires with 
some identical items, one to be asked of individuals 0-13 years and the other to 
be asked of individuals 14 and over. Thus, it would be particularly convenient to 
Census if the MELP could conform to that format as well. The letters sent to 

O schools, then, asked for lists of children enrolled in elementary schools . 
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But vhat about the sampling of adults (14 years and older)? ^flas a known 
groups validation design possible for them? Did there already exist classifica- 
tions of adults as being LESA and non-LESA? One source of such classifications 
might be adult education programs. Such programs routinely employ some sort 
of placement procedure for people with non-English backgrounds, and the resulting 
placement can be interpreted as a classification of an individual as either LESA 
or non-LESA# A difficulty with sampling from adult education programs is the 
self -selection factor. Clearly, those who voluntarily seek out an adult education 
program are not a random sample of any general population. Moreover, that popula- 
tion would not normally include any individuals between the ages of 14 and 18. 
Thus, adult education samples would exclude secondary-school students (who were 
also excluded from our child sample). Nevertheless, since no other a priori source 
of LESA and non-LESA categorizations could be found, the decision was made to ask 
school districts for ''lists of names of up to 500 adults from foreign language 
backgrounds who are enrolled (or have been enrolled very recently) in adult basic 
education programs, including English as a second language if these are sponsored 
by your school district'* (letter from NCES to school districts). 

This, then, was the overall design of the field test as it evolved during the 
June discussions in San Francisco. The samples would be drawn from lists of pre- 
screened children aged 5-13 provided to us by school districts with large coucentra 
tions of students having non-English language backgrounds. Separate lists of adult 
education program participants were also requested. The particular list from which 
an individual was drawn (LESA or non-LESA), then became a primary piece of criter- 
ion information about that individual along with his or her test score and DORP 
rating. (Interviewers were not informed of which list a respondent was on, i.e., 
all interviewing was done "blind*' with respect to list membership.) 
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Choosinp ; the Ethnic-Linguistic Groups to Participate in the Field Test . In RTl's 
proposal to NCES , field testing was suggested in the following groups: Cubans 
(Miami), Puerto Ricans (New York City), French (Manchester, New Hampshire), 
Chicanos (San Antonio), Navajos (Gallup, New Mexico), and Chinese (San Francisco). 
However, the revision of the sampling procedure required the reconsideration of 
all sites. Underlying the original choices was the requirement of sampling both 
from some of the largest groups in the U.S. having relatively high proportions of 
limited English speakers and from a culturally wide range of groups. Attempting 
to honor these requirements to as great an extent as possible, NCES approached the 
Texas, Florida, California, Arizona, New Mexico, New Jersey, and Massachusetts 
education agencies for' their cooperation and suggestions about the school districts 
in their states would be most appropriate to approach for their cooperation. The 
Navajo. Nation was also contacted for their suggestions. Negotiations for obtaining 
lists were begun with the Dade County (Miami), El Paso, Camden, San Francisco, Tuba 
City (Arizona), Window Rock (Arizona), and Ganado (Arizona) public school systems. 

Eventually, lists of children were obtained from Dade County (Cubans), El Paso 
(Chicanos), San Francisco (Asians), Window Rock (Navajos) and Ganado (Navajos). 
The San Francisco Independent School District agreed to supply names of both Chinese 
and other Asian children in about equal numbers. Lists of adults enrolled in adult 
education programs were obtained only from Dade County and El Paso. Thus, the field 
test was held in four locations (Window Rock and Ganado are adjoining districts), 
and drew from five ethnic-linguistic groups - Cubans, Chicanos, Navajos, Chinese, 
and other Asians. 



A problem of finding adult respondents in the Navajo and Asian groups still 
^ remained. It was finally decided to sample adult respondents from the homes of the 

child respondents in those sites. This had the advantage of being cost-efficient 
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but had the disadvantage (from a sampling point of view) of only drawing adults 
from households containing children of elementary school age. The plan in those 
sites for selecting an adult respondent in a given household was as follows: first, 
the interviewer was to construct a household roster listing the name and age of each 
household member, and, second, she was to randomly choose one of the adults (age 
14 and over) using a table of randora numbers. This would give representation in 
the adult sample to all age groups over 13, including persons 14-18 who were not 
represented in the Cuban and Chicano samples. 

2« The Accuracy of First-hand Data and "Proxy" Data 

A focus of the field test was to investigate whether one adult in the house- 
hold could give accurate answers to questions about another adult in the household, 
especially with regards Lo English proficiency. Such responses will be called 
proxy data and it was desirable to compare their quality, relative to the criterion 
measures, to the qtaality of first-hand data. This is important in the context of 
the SIE because of Census' preference for talking to only one adult in each house- 
hold (the Household Respondent) and obtaining information about all members of 
the household from him. In order to address this question, interviewers were asked 
to obtain both first-hand and proxy responses to the MELP questionnaire whenever 
there were two adults present in the household. 

3. The Lan^uaae Ability of the Interviewer 

Another concern about the accuracy of the data revolved around the fact that 
monolingual (English speaking) interviewers would inevitably be dealing with re- 
spondents whose English proficiency ranged from excellent to none. And, in addi- 
tion to the linguistic factor there was also the cultural difference between the 
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monolingual, probably Anglo, interviewer and the ethnically distinct respondent. 
This difference could easily take its toll on the rapport between the two and thus 
influence the character of the data collected. In order to evaluate the severity 
of these problems, one component of the design of the field test was to compare 
the data collected by monolingual (English) interviewers and bilingual interviewers 
whose native language and ethnic origin was that of the respondent's. This was 
done by matching the assignments of monolingual with bilingual interviewers in each 
site through randomizing the names and addresses of the individuals they were to 
interview. 

4. The Interviewing Procedures 

All data collection and analysis activities associated with the field test, 
from the recruiting of interviewers to the statistical analysis of the data were 
the responsibility of the Research Triangle Institute under a subcontract arrange- 
ment with CAL. The following description of the field procedures is taken from 
pages 24-27 of RTl's final report of their subcontract activities, "The "OQ" 
referred to is the Census -style questionnaire containing verious demographic and 
candidate HELP questions. 



Interviewer assignments were prepared by the site supervisory teams, 
following detailed procedures designed by RTI*s Sampling Department to (1) 
equalize the effort for children and adults; (2) equalize the effort for 
each child or adult's proficiency level defined by the schools (e.g., in 
Miami: non-independent, intermediate, and independent); (3) increase the 
precision of the comparison beU<?een bilingual and monolingual interviewers; 
and (4) randomize the subsample of interviews to be monitored by the CAL 
staff. 



The field procedures followed by the interviewers during the field test 
are detailed in the interviewer's field manual, a copy of which is included 
in the attachment to this report. The procedures for the three principal 
types of cases are summarized belo^^?: 

. Designated Child Respondents (DCRs) 
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(1) The interviewer calls in person at the sample house- 
hold at a time vhen a household respondent (household 
member at least 14 years old) is likely to be home. 

(2) The interviewer locates a household respondent and (a) 
introduces herself, (b) verifies that the DCR is a 
household member, and (c) explains the study. 

(3) The interviewer administers the Census Questionnaire (CQ) 
and Household Information Form (HIF) to the Household 
Respondent. (NOTE: The household respondent responds to 
the CQ on behalf of the DCR.) \ 

(4) The interviewer determines the age of the DCR. 

(5) The interviewer interviews the DCR. (NOTE: If the DCR 
is ten or older, the interviewer administers the CQ and 
criterion measures; if the DCR is nine or younger, the 
interviewer administers only the criterion measures.) 

. Designated Adult Respondents (DARs) from School Lists (Miami and El Paso) 

(1) The interviewer locates a household respondent as for DCRs 
above. 

(NOTE: The household respondent can also be the DAR, if 
the DAR is the first person 14 or older the interviewer 
encounters .) 

(2) The interviewer administers the CQ and HIF to the household 
respondent . 

(NOTE: The CQ is second-hand if the household respondent is 
not also the DAR; first-hand if the household respondent is 
the DAR.) 

(3) The interviewer interviews the DAR. 

(NOTE: If the household respondent is the DAR, the CQ will 
have already been administered and the interviewer continues 
with the criterion measures.) 

. Designated Adult Respondents (DARs) Randomly Selected from DCR House - 
holds (N.E. Arizona and San Francisco) 

The interviewer locates a household respondent, as above. 
The interviewer then randomly selects an adult member of 
the household, who becomes the DAR. The interviewer then 
proceeds to interview the household respondent, DCR, and 
DAR as described «^Hove. 

A number of minor procedural changes and refinements were made as the 
fieldwork progressed and problems became apparent. One notable change that 
was implemented near the end of the fieldwork period concerned obtaining 
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second-hand CQ information on adults. In order to increase the nximber 
of cases where second-hand CQ data were obtained on DARs , interviewers 
were instructed to attempt to find a household respondent who was not 
also a DAR» One callback was authorized to accomplish this, if necessary. 

Respondents were paid cash incentives by the interviewers at the rate 
of $2»00 for each completed CQ and $2.00 for each completed set of criterion 
measures. Incentive payments made directly to DCRs were made with the 
knowledge of a responsible adult member of the household* No payment was 
made for the short HIF, which was completed in conjunction with the initial 
CQ. 

Interviewers were instructed to make up to two calls at a sample house- 
hold in order to contact a household respondent. If the interviewer was 
unable to contact a household respondent on the first call, she would attempt 
to find out from neighbors when the household residents were most likely 
to be found at home, and made her second call at that time. If neighbor 
information was unavailable, the interviewers were instructed to make the 
return call after 6:00 p.m. on a weekday or on a weekend. After initial 
contact, the interviewer was allot-jed up to two or more calls to complete 
interviewing in the household. If she had still not completed her work at 
the household after two additional callbacks, she was instructed to dis- 
cuss the case with a site supervisor immediately* 

The interviewers were not permitted to substitute non-sample persons 
for designated respondents. All non-interview cases had to be discussed 
with a site supervisor, who would determine what, if any, additional action 
should be taken. If no further action was vjarranted, the supervisor would 
approve the uoninterview result and provide the interviewer with a substi- 
tute case, according to the interviewer assignment procedures developed by 
RTl's Sampling Department. 

The two RTI supervisors in each site remained in the field during the 
fieldwork period in order to monitor closely the data collection activities 
of the interviewers. The supervisors normally met with each interviewer 
at least twice a week to review the status of each of her active cases and 
to advise and assist her as necessary. The supervisors were responsible for 
editing and approving the instruments associated with each completed case 
and for mailing completed cases to RTI on a flow basis. Additional cases were 
assigned to interviewers when ax)propriate , foliating procedures specified 
by RTl's Sampling Department. The supervisors were also responsible for 
validating the fielA^7ork by contacting at least ten percent of each inter- 
viewer's respondents (those not monitored by CAL staff) to verify that the 
interviewer had conducted the interview properly and that the respondents 
had been paid. Other responsibilities of the site supervisors included 
monitoring interviewer costs; controlling the issuing and retrieving of 
advances to interviewers for use in making cash payments to respondents; 
recruiting and training replacement interviewers, as necessary; maintaining 
records on the handling and status of each case; and reporting to RTI at 
least weekly the status of the fieldwork in the field test site. 
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Interviewer training and interviewing were begun in Miami and El Paso on 
July 21 and on July 28 in Arizona and San Francisco, Data collection was completed 
on August 16 in Miami and El Paso and on August 23 in Arizona and San Francisco, 
The results of these efforts are discussed in detail in Section VI. H of RTl's 
final report, but Table 1, reproduced from that report, summarizes statistics on 
the numbers of interviews attempted and completed in each site, along with measures 
of the amount of effort expended to obtain them. 
• (Refer to Table 1, on next page) 

5. Monitoring of Interviews 

GAL personnel monitored approximately 15% of the interviews in each site for 
two reasons ; 

1. To observe and report on the interaction between interviewer and respon- 
dent during the asking and answering of each potential HELP question for 
evaluating and improving the questions. 

2. To ensure that the interviewers were following recommended procedures 
and, if necessary to recommend any modifications of those procedures to 
RTI and CAL supervisory personnel. 

CAL monitors were randomly assigned to interviewers on a daily basis and simply 
accompanied the interviewer on his or her rounds for the day. The behavior obser- 
vation system described in Chapter III was filled out for each administration of 
the "OQ" first hand or proxy. Upon the completion of the field work, each 
monitor submitted a summary report, either written or verbal, focused on the 
aspects of the interview procedure that seemed to work well, those that worked 
badly, etc. 
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Table 1; 



DATA COLLECTIOK E£SULTS OF FlhXO TEST^^ 





Miami 


El Paso 


Arizona 


San Francisco 


Total 


Potential Respondents 
Assigned — 


1,079 


1,071 


972 


1,192 


4,314 


Interviews vith Children 


335 


426 


358 


353 


1,472 


Interviews vith Adults 


333 


265 


315 


319 


1,232 


Total Interviews 
(Percent) 


668 


691 
(65%) 


673 
(69%) 


672 
(56%) 


2,704 
(63%) 


Refused 
(Percent) 


26 
(2X) 


IS 
(2%) 


16 
(2%) 


54 
(5%) 


114 
(3%) 


3/ 

Other Nonrespondents— 
(Percent) 


385 
(36Z) 


362 
(34%) 


283 
(29%) 


471 . 
(40%) 


1,501 
(35%) 


Total Konrespondents 
(Percent) 


^11 

(38/;) 


380 
(35%) 


299 
(31%) 


525 
(44%) 


1,615 
(37%) 


Total Hours Chargec-^^ 


2,916 


2,992 


3,203 


2,917 


12,028 


Total Miles Driven^'' 


22,966 


21,079 


34,328 


8,299 


86,672 


Average Hours Per 
Intel view 


•A. A 


4.3 


4.8 


A. 3 


4.5 


Average Miles Per 
Interview 


3A.A 


30.5 


51.0 


12.4 


32.1 


X of Adult Respondents 
vith 2nd Hand Census 
Questionnaires^^ 


36% 


36% 


83% 


36% 


48% 



1/ 



Figures in this table are based upon ir^nual counts and coziputations by interviewers 
and supervisors and have not been verified by s^achine tabulations. 
2/ 

^ In Miami and El Paso both children and adults were assigned to interviewers. In 
Arizona and San Francisco only children were assigned, since no adult lists were 
obtained for these sites. Interviewers rancozily selected an adult £ron: each sample 
child's household in those sites. For Arizoria and San Francisco, therefore, the 
number of potential respondents was twice the number of sample chi3dren assigned. 

3/ 

"~ Exarr.ples of "other" nonrespondents include cases where the saT2p3.e neTiber had Doved to 
another city; where the address was nonexistent; where the sa:r.ple mer.ber could not be 
contacted at honic in the prescribed nun:ber of interviewer visits; where the sa.r.pie 
luember was out of town; or where he was sick, institutionalized, or otherwise unavailable 

i,/ 

Includes training time. 
^^Includes mileage incurred in connection with training. 

"^^Figurcs shown indicate the percent of adult respondents in each site about whOT. Census 
Questionnaire data were obtained fron: a hcuschcld r.c^iiber other than the respondent as 
veil as iroz. the respondent himself. 
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The following CAL staff were assigned to the various s 



Dade County 

Dr, Alberto Rey - CAL site supervisor 
Pedro Ruiz 
Cynthia Lindsey 
Roberta Mailman 

El Paso 

Amador Bustos - CAL site supervisor 
Dr. J. Terry Webb 
Gloria Lozano 
Benjamin Zambalas 

Arizona 

Carolyn Karelitz - CAL site supervisor 
Evangeline Kamitsuka 
Annie Panlibuton 
Claire McKenzie 

San Francisco 

Anna Lai v. Qpj^ site supervisors 

Michael SamVargas / 
Jennie Yee 
Margaret Robbins 



n«T ftVMUBlt 



6. Visits to the Sites by CAL Central Staff 

During the course of the field work, each site was visited by at least one 
member of the CAL central staff. The objects of these trips were: 

1. To interview GAL and RTI field personnel in depth to learn about and 
resolve any procedural or coordination difficulties in the two staffs. 

2. To interview local school officials in depth to gather information 
relevant to the screening procedures which formed the basis for the list 
samples . 

The trips made were: 
Miami: Robert Pearl (CAL consultant) 
Jeanne Freeman 
Walter Stolz 
El Paso: Jeanne Freeman 
Arizona: Walter Stolz 
San Francisco: Rudolph Troike 

Zi Editing, Coding, and Entering the Data into Computerized Files , 

The details of this process may be found in Section IV. I of RTl's final report. 
Basically, the procedure involved several stages of checking and editing the com- 
pleted interview materials and then entering the data directly into computerized 
files through the use of a terminal. The confidentiality procedures employed during 
these phases of the work are described in Section IV. J of RTl's report. The data 
entry procedures were completed during the week of September 8. All of the statis- 
tical analyses performed on these data were implemented by the RTI statistical staff 
under CAL's direction • 
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V. Preliminary Analyses; Selection of the MELP Questions 



From the point of view of Census Bureau field operations, the optimal MELP 
was a small set of simple questions which could be asked by the interviewer about 
each member of the household. Ideally, all such information would be obtained 
from the Household Respondent. As conversations with Census and NCES progressed 
during the first several months of the project, it became very clear that any 
direct measure of proficiency, such as an interviewer-administered rating, which 
required the interviewer to actually talk with each person for whom a LESA or non- 
LESA categorization was to be made, would require extensive replanning and re- 
budgeting on the part of Census. Thus, the obvious first priority of the analysis 
of the field test data was to ascertain the degree of relationship between individual 
MELP questions and the criterion variables. Tf several of them showed relatively 
' high and consistent relationships with the criteria across all groups, then some 

"mapping" of those questions onto LESA and non-LESA categories was clearly the MELP 
of choice. This chapter summarizes the relationships of the various individual 
MELP questions to the criteria. In fact, high and stable (across groups) relation- 
ships were found and thus a set of such questions was forwarded to NCES on October 
2, 1975 for use in the SIE. Also covered in this chapter are the rules used to 
quantify the responses to the MELP questions for further statistical analysis. 

The remainder of the project work, then, was devoted to constructing "scoring 
keys" for these questions -- that is, procedures for categorizing an individual as 
LESA or non-LESA on the basis of his quantified responses to the MELP questions. 
Those activities and their results are summarized in Chapters VII and VIII. 

1 . "Cleaning" the Data Files 

i 

W Before any analyses of the field test data were done, the files were examined 

so that any data gathered from respondents who were irrelevant to the project could 
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be eliminated. In particular, only the data from people V7ith non-English language 
backgrounds were appropriate to be analyzed since only they would be administered 
the MELP in the SIE. Therefore, the data from respondents who met all three of 
the following conditions were eliminated permanently from the data files: 

a. No other language but English present in the household. 

b. The respondent spoke no other language but English, 
€• The respondent was born in the U.S. 

The data from 40 children and 14 adults were eliminated from the study as a 
result of this procedure.* 

2» Relatio nships of Individual Questions to the Criteria 

All analyses were accomplished within the framework of the SPSS statistical 
system. The basic analysis device was a simple contingency table where the 
responses to each census question were cross-tabulated with test total scores 
and list information (where available) separately for each of the populations 
represented in the field test as follows: 

a. Children: 

1) Cubans 

2) Chicanos 

3) Chinese 

4) Other Asians 

5) Navajos from Ganado schools 

6) Navajos from Window Rock schools 

b. Adults: 

1) Cubans 

2) Chicanos 



* It was later ascertained that most of the children who were eliminated were from 
monolingual families who had requested placement in the bilingual program to learn 
Q the aon -English language. 
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3) Chinese 

4) Other Asians 

5) Navajos 

The Navajo children vere split by school district vhen their data were 
cross-tabulated with school list because the two school districts from which the 
field test sample was drawn had very different methods of assigning children to 
lists. School list information was only available for Cubans and Chicanos. 

All contingency tables that included test scores were constructed by arbi- 
trarily dividing the test scores into ten-point intervals. The possible range 
for the children's test was 0-67, the possible range for the adult's test was 



For each two-way cross tabulation (question responses by list or test for 
a given subpopulation) , several summary statistics were computed. On the 
recoiranendation of Dr^ Robert Mason of RTI, the two indices used were Cramer's V 
(Cramer, 1945) and the correlation ratio, eta. The former was used where the 
responses to a question were not orderable on a continuum (e*g»^ origin or 
descent), while eta was used when the response categories were ordered* In the 
latter case the eta was computed using the question responses as predictors and 
test or list as the predicted variable. 

To facilitate the examination of the several hundred cross-tabulations, a 
tV7o day conference was convened of the following individuals: 

Burton Fisher, University of Wisconsin 

John Upshur, University of Michigan 

Protase Woodford, Educational Testing Service 

Harold Yee, Asian Inc. (San Francisco) 



0*57. 
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Robert Mason, RTI 

Alberto Key, Ho\^ard University and CAL 

Margaret Bruck, McGill University and CAL 
G. Richard Tucker, McGill University 
Walter Stolz, CAL 
Leslie Silverman, NCES 

Vicki Kojsich, NCES \ 
David Orr, NCES 

Included in this group were specialists in language testing, survey research, 
statistics, linguistics, psychometrics and bilingualism. In addition , three of the 
specialists were members of three of the largest ethnic groups to be surveyed by 
the SIE- 

The conference was held September 22-24 at CAL, with Leann Parker and Evangeline 
Kamitsuka providing logistical support. Although the discussion of the data ranged 
over many topics during the two days, the basic question selection procedure used 
by the group was as follows : 

1* Summary tables were created (separately for children and adults) in which 
only the Cramer's V and/or the eta was entered for each question/criterion-measure/ 
subpopulation combination. 

2. Questions with consistently high indices of association were selected for 
further examination. Generally speaking, for a question to be selected, its 
Cramer's V values had to exceed .20 in every subpopulation (except Window Rock when 
the cross -tabulation was with list). 

3. The cross tabulations for the selected questions were examined to make 
sure that the pattern of association between the question responses and criterion 
was the same within all subpopulations • 

4. The data for the discarded questions x^ere perused once more to ascertain 
that the question had not been wrongly eliminated. 



The summary tables from which the group worked are reproduced as tables 1 
through 4.VC Underlined rows correspond to questions recommended to NCES as MELP 
questions on October 2, 1975. The field test questionnaire is reprinted as Figure 1 
and the final wordings of the MELP questions as recommended to NCES are given in 
Section 4 of this chapter. 

Comments on Tables 1 and 2: 

1. It was assumed that questions 1, 2, 3, 4 and 21 would be present in the 
SIE questionnaire regardless of their usefulness as LESA indicators and thus they 
were not included in the recommended MELP questions even though most of them were 
highly related to the criteria. 

2. Question 5 was retained as one of the MELP items proposed for inclusion 
because it was part of questions 6 and 7. (Its relationships to the criteria were 
low because virtually no children were characterized by the household respondents 
as neither speaking nor understanding an^. English.) 

3. Question 27 was another way of phrasing questions 5, 6, and 7. It had 
been used in the NCES supplement to the July CPS and so was used here, but it was 
judged more difficult to understand than 5, 6, and 7 and so was not selected for the 
final MELP. 

4. For Cubans, the relationship of question 31 to the criteria was low be- 
cause the household language was almost universally Spanish in that group. 

With respect to tables 3 and 4 it should be noted that the relationships 
between the questions and the adult's list classification are generally lower than 
between the questions and test scores. 



* Relationships of questions to DORP scores were also inspected by the group during 
the selection process, but because of incomplete data they did not play a central 
role in the selection. 
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Table 1: Children: 



Cross Tabulations of Responses to Questions with 
Test Total Scores 



Numbers are Cramer's Vy except where appears after ques* 
tions, Etas are given for asterisked questions. 



MELP Questions Cubans 



r 



1. 


415 


J.47 


419 


654 


416 


2. 


202 


200 


185 


133 


141 


3. 


217 


249 


119 


164 


113 


4. * 


447 


567 


274 


471 





5. * 


247 


237 


092 


140 


256 


6. * 


625 


636 


523 


544 


509 


7. * 


634 


616 


518 


491 


402 


9. 


128 


176 


285 


133 


159 


10. * 


163 


327 


286 


272 


368 


11. * 


150 


351 


346 


120 


340 


12. 

a. 


256 


286 


219 


216 


535 


b. 


197 


380 


179 


249 


624 


c. 


147 


318 


202 


199 


305 


d. 


278 


326 


253 


247 


289 


13. * 


263 


385 


239 


238 


234 


14. * 
a. 


246 


234 


315 


353 


308 


b. 


410 


262 


249 


322 


287 


c. 


469 


347 


340 


497 


259 


15. * 


118 


208 


349 


060 


088 


16. * 


211 


117 


000 


156 


020 


17. * 


103 


050 


092 


045 


064 


18. * 


107 


Oil 


183 


174 


119 
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Table 1 continued. 



nhiLr Questions 


Cubans 


Chicanos 


AsianfExc. nhln^ 


Chinese 


Navajos 


19. * 


163 


295 


413 


0/0 


250 






195 


079 


049 


092 


21. * 


526 


209 


399 


650 


538 


22. 


602 


246 


/CO 

458 


584 


474 


/3 . * 


lift 
lly 


512 


ion 
loO 


o o tr 

235 


/ O "7 

437 


24. * 


174 


518 


135 


395 


380 


25. * 


163 


034 


036 


032 


085 


27. 


281 


310 


296 


262 


274 


28. * 


345 . 


302 


146 


289 


235 


31. 


128 


469 


246 


208 


320 


32. 


167 


406 


208 


213 


342 



r 

ERIC 



Table 2 - Children: Cross tabulations of Responses to Questions with School 

List Information 

Numbers are Cramer ^s V except where * appears 
after question; Etas are given for asterisked 









questions . 


















(Gan. ) 


(WR) 


!1ELP Questions 


Cubans 


Chicanos 


Asian fExc. Chin") 


Chinese 


Nava 1 o 


Na'--^ io 


1. 


226 


031 


Oil 


240 


173 


048 


2. 


154 


403 


267 




223 


089 


3. 


140 


337 


429 


248 


. 123 


034 


4. * 


561 


616 


377 


277 






5. * 


172 


Ten 






070 


087 


6. * 


580 


659 


347 


537 


420 


133 


7. * 


516 


657 


378 


498 


360 


257 


9. 


076 


282 


416 


241 


148 


158 


10. * 


100 


522 


354 


399 


420 


234 


11. * 


1 An 

lOU 




306 


312 


353 


392 


12. 

a . 


250 


698 


384 


379 


243 


267 


b. 


076 


750 


403 


422 


263 


270 


c. 


117 


607 


368 


234 


228 


206 


d. 


258 


482 


214 


360 


269 


235 


13. * 


268 


418 


322 


Ilk 


326 


142 



14. 

a. * 


118 


154 


159 


486 


123 


114 


b. * 


257 


165 


234 


355 


213 


047 


c. * 


281 


191 


391 


301 


189 


066 


15. * 


115 


258 


227 


197 


128 


096 


16. * 


146 


075 


103 


120 


120 


010 


17. * 


053 


034 


117 


031 


149 


127 


18. * 


029 


044 


162 


366 


041 


067 


19. * 


165 


247 


353 


271 


362 


109 



Table 2 continued* 

(Gan. ) 0^) 



HELP Questions 


Cubans 


Chicanos 


Asian(Exc. Chin) 


Chinese 


Nava j o 


Nava j o 




07S 


JLU J 


19 8 


161 


125 


191 


9 1 * 


/ 




9 A*^ 

^ Q ^ 


'^96 


26^ 


495 

•"T ^ J 


99 * 


J J X 


191 
X 




O i \J 


1 87 


484 

•-r ^•J'-r 




ion 


A9 A 






9QQ 

^ -7 


194 


OA -Sr 




A A 
DO J 


"^9 R 


HO/ 


9Q9 


9 64 


/_) • 


U_)D 




1 c:Q 
X J7 


H/l/l 


1 QO 

X -/U 


1 1 

X xo 




9 1 Q 


9nR 


/ DM- 


•^78 

J / o 




456 


27. 


119 


536 


295 


368 


359 


105 


28. * 


216 


309 


093 


245 


161 


173 


29. * 


045 


048 


040 




136 


080 


31. 


068 


717 


426 


376 


312 


123 


32. 


154 


667 


423 


373 


242 


117 




^^^^^ ^ " Adults: Cross tabulations of Responses to Questions with Test Total Score; 

Numbers are Cramer's V except where * appears after 
question; Etas are given for asterisked questions. 

HELP Questions Cubans Ch jcanos Asian- (Ekc. Chin.^ C.h^r^f^.^^. ^ ^Y-j^ 

1. 166 051 392 316 212 

2. 104 183 280 213 168 
3- 135 153 331 298 142 
^' * 225 235 279 270 000 

5- * 376 311 105 547 288 

6- * 561 477 534 703 645 

7- * 519 467 565 672 592 

9- 110 115 371 243 102 

10. * 150 147 496 180 220 

11. * 157 165 456 333 224 
12. 

a. 120 426 373 351 253 

162 135 347 324 191 

c« 159 214 322 308 143 

d' 201 198 360 386 215 

e. 208 145 286 338 269 

13. * 281 347 336 361 425 
14. 

a. * 450 295 389 578 564 

b. * 493 388 45^ 562 382 

c. * 399 266 366 620 434 

15. * 113 116 213 428 110 

16. * 069 175 130 424 143 
17. * 154 039 016 . 051 145 
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Table 3 continued. 



MELP Questions 



Cubans 



18. * 

19. * 

20. * 

21. * 

22. * 



23. * 

24. * 

25. * 
26. 
27. 

28. * 

29. * 

30. -E 
31. 
32. 



133 
113 
091 
474 
365 
143 
205 
009 
691 
290 
240 
161 
191 
106 
219 



Chicanos Asia n (Exc. Chin.) 

074 



Chinese 



141 
262 
057 
348 
412 
190 
320 
157 
438 
180 
200 
094 
301 
185 
177 



279 
168 
512 
581 
276 
456 
106 
707 
366 
347 
197 
258 
271 
425 



086 
262 
051 
666 
668 
306 
616 
051 
829 
416 
253 
103 
298 
360 
301 



Navajo 

253 
120 
041 
715 
667 
263 
287 
051 
543 
314 
389 
251 
321 
407 
284 
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Table 4: Adults: Crosstabulations of Responses to Questions vith School List 

Information. 

Numbers are Cramer *s V except where * appears 
after question; Etas are given for asterisked. 



r 





ques t ions . 




MhLr Questions 


Huhans 




1. 


151 


023 


2. 


TOO 

133 


198 


3. 


093 


109 


4. * 


064 


177 


5. * 


255 


058 


6. * 


416 


229 


7. * 


321 


113 


9. 


089 


029 


10. * 


125 


129 


11. * 


150 


102 


12. 

a. 


082 


10/ 


b. 


078 


127 


c. 


082 


152 


d. 


083 


143 


e. 


085 


129 


13. * 


183 


106 


14. * 
a. 


334 


161 


b. 


350 


144 


c. 


331 


138 


15. * 


058 


075 


16. * 


105 


092 
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Table 4 continued. 
MELP Questions 



Cubans 



Chicanos 



17. * 


131 


067 


18. * 


097 


044 


19. * 


100 


107 


20. * 


148 


159 


21. * 


148 


159 


22. * 


318 


384 


23. * 


138 


100 


24. * 


161 


247 


25. * 


054 


085 


26. 


671 t 


406 t" 


27. 


237 


137 


28. * 


138 


070 


29. * 


255 


074 


30-E. 


023 


040 


31. 


098 


002 


32. 




076 



Based on very small sample sizes 
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3. An Evaluation of the HELP Questions: Reports from the Monitors 



Once the questions had been selected they were examined to see if they needed 
to be improved in their wordings. One source of information relevant to this was 
the monitors* observation data and their summary reports submitted at the end of 
the field test. The table below gives the results of the monitor observation 
system for several of the questions. 



Behavioral 
Category 


2 


5 


6 


Question 
7 12b 


number 
12d 


21 


22 


21 


31 


No Response'''^ 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


Irrelevant Answer' 


k 0 


0 


0 


0 


0 


0 


1 


0 


1 


0 


Another person 
Answers * 


7 


6 


9 


8 


6 


9 


11 


9 


8 


7 


Seeks clarifica- 
tion* 


14 


1 


4 


3 


2 


0 


9 


6 


7 


3 


Interv. Rephrases ^-^0 


2 


7 


5 


3 


5 


14 


14 


14 


7 


Interv* uses 
Native L.* 


36 


35 


30 


30 


33 


34 


35 


36 


37 


33 


Respondent uses 
Native L.* 


36 


35 


31 


31 


35 


34 


36 


35 


37 


33 


Total Frequen- 
cies 


376 


371 


334 


333 


362 


361 


366 


349 


360 


366 


Sum of N.R. , I. A. 
S.CI.R.* 


> 

34 


3 


11 


8 


5 


5 


24 


20 


22 


10 
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^ Percents of total frequencies. 
These are pooled accross all administrations of the MELP questions that were 

monitored. The last row gives the total percent of no responses ^ irrelevant ans- 
wers > seeks clarification > and interviewer rephrases and might be taken as a general 
index of the difficulty of administration of the question. The troublesome ques- 
tions were clearly: #2 (origin and descent), #21 (level of education), #22 (years 
of education in English), and #27 (CPS question rating English proficiency). 
Comments from the monitors indicated that: 
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!• For question 2, the words "origin" and "descent" as well as the concept 
of ethnic background often caused difficulty^ Navajos needed to have 
the word "tribe" substituted. 

2. In question 21, there was often uncertainty about how to translate 
foreign schooling into U.S. terms. 

3. In question 22, there was sometimes an ambiguity between years of 
having been taught the English language and years of instruction 
in content areas us inj5; English as the medium of instruction. The 
latter was intended. 

4. Question 27 was double barreled and the alternative responses were 
extremely difficult to understand. 

5. In responding to Question 31, some respondents indicated that both 
languages were used equally often and they had to be prodded into 
making a forced choice. 

6. For questions 6 and 7 most problems involved the term "adequately". 

7. Finally, it was suggested that question 7 be placed before question 6 
because often the word "speak" was initially taken in its generic sense 
meaning both speak and understand. However, if the question about 
understanding was placed first, the proper sense of "speak" would be 
suggested to most respondents. 
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Originally ve anticipated that the two categories "interviewer uses native 
language" and "respondent uses native language" would be indicative of difficul- 
ties in communicating a question or an answer in English. This may have been the 
case for the monolingual English-speaking interviewers , but according to the monitors * 
comments, it was not the case in the interviews conducted by bilingual interviewers. 
In the latter case, the interviewers found that it was viewed by respondents as a 
lack of courtesy for the interviewer to attempt to conduct the interview in English 
(as was their instruction) when it was difficult and/or embarrassing for the re- 
spondent to do so and when the interviewer was clearly competent in the respondent's 
native language. Thus, interviews were frequently conducted in the native language 
even when, according to the monitor's judgment, it could have been conducted mostly 
or entirely in English. Accordingly, these behavioral categories were not inter- 
preted as originally planned. 

4. Modifications to the "How Well" Questions 

From the beginning of the field test it was clear that the set of response 
alternatives to the "how well" questions (#6 and #7) could be improved. After a 
week of field testing with the set very well > well ^ adequately , just a little > and 
not at ali a the term adequately was replaced by two alternatives: adequately for 
most purposes and adequately for only a few purposes . (CAL staff considered 
adequately to be overly ambiguous.) However, this did not solve the problem. The 
word remained highly ambiguous to some, and to others it was simply unfamiliar. 

Also, the term well proved to be non-discriminative. In fact, analysis of 
the data showed that well > adequately for most purposes , and adequately were all 
applied to people of about the same English proficiency level as mGasured by test 
score. Table 5 gives the mean test score for responden-ts to whom each response 
alternative was applied. For example, the mean test score of all adults who rated 
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themselves as speaking English "very well'' was 43, and the mean test score of all 
children who were rated as speaking "very well" was 56* 

Table 5 ; Hean test scores for each response alternative in the question 
ratin^i English proficiency • (Pooled across ethnic groups) 



Response Alternative Adult Child 

Speak Understand Speak Understand 

Very Well 43 41 56 55 

Well 34 33 49 52 

Adequately for most 34 31 48 47 

Adequately 32 29 46 46 

Adequately for few 28 25 40 39 

Just a little 17 17 37 36 



For adults, the average difference between the means of well , adequately for 
most purposes, and adequately was 1.5 compared with an average difference of 7.3 
between all other successive alternatives. The largest difference between any 
successive pair of the three was 2.25 while the average difference between all 
other successive pairs was 4.83. On the basis of this analysis, it was decided 
to collapse the three alternatives into a single scale position. After con- 
sultation with a number of the CAL staff, the following response alternatives 
were agreed upon and included in CAL' s October 2 memorandum to NCESs 

1. Very "ell 

2. All right 

3» Enough to get by 
4» Just a few words 
5. Not at all 
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FIGURE 1 

(items selected for MELP are starred) 



BILINGmi STUDY 
CENSUS aJESTION:'{AIRE 



O.M.B. No. 

Expires 



ID No. of 


DR 


Sex 




FI 




FI No. 


Date 


Type (/): 


Q Self Report 


□ 


Second Hand Report 



What Is . . .»s date of birth? 
Month ^ Day Year 



What is . . .'s origin or descent? (USE FLASH CARD A) 



In what state or foreign country was . . . born? (USE FLASH CARD B) 
When did . . . come to the U. S. to stay? 



Does . . . speak or unders-tand any English? 
1. Yes 

2. No (SKIP TO Q.8) 

3. Don't know (SKI? TO Q.8) 



How well does . . . speak English? (READ ANSWER CHOICES 1-5) 

I' l^?^ "^^^ a little"^ 

, , 5. Not at all 

3. Adequately g. ^on't know 



How^well does , . . understand spoken English? (READ ANSWER CHOICES 



1. Very well 4. j^^^ ^ ^^.^^^ 

t' ^ , 5. Not at all 

3. Adequately 6. Don't know 

What (OTHER) languages does . .. . speak? (USE FLASH CARD C) 



(IF mm, SKIP TO Q.12. IF ONLY ONE, SKIP TO Q.IO) 



Which of these languages does . . . speak most often? (USE FLASH CARD C) 
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10* How well does . . . speak (PR IUCIFAL LANGUAGE FROM Q.8 OR Q.9) ? 
(READ ANSWER CHOICES 1-4) 



1. Very well 4. Just a little 

2. Well 5, Don't know 

3. Adequately 



!!• How well does . . . understand ( PRINCIPAL LANGUAGE FROM Q.8 OR Q.9) ? 
(READ ANSWER CHOICES 1-4) 

1. Very well 4. Just a little 

2, Well 5. Don't know 

3, Adequately 



12. What language does . . . usually speak when talking to: (USE FLASH CARD C) 

* a* brothers and sisters? . 

b • gar ent s ? 

c. other older relatives? 

* d. . . .'s best friend? 

* e* (IF . . . IS AN ADULT) children in the household? 



13, During the past year, did . . . have difficulty reading books because 
they were in English? 

1. Yes 

2. No 

3. Don' t know 



14. How often does . . . read: 
* a, an English- language newspaper? (READ ANSWER CHOICES) 

1. Often 

2. Occasionally 

3. Not at all 



b. magazines in English? (READ ANSWER CHOICES) 

^ 1. Often 

2. Occasionally 

3. Not at all 



books in English? (READ ANSl^R CHOICES) 

1. Often 

2. Occasionally 

3. Not at all 



r 



15 How often does . . ♦ read newspapers^ magazines, or books in a 
language other than English? (READ ANSWER CHOICES) 

1. Often 

2. Occasionally 

3. Not at all 



16. At any time, during the past year, did . . . attend regular school in 
the U. S.? 



O 1. Yes 

ERIC HZ 2. No 



17. During the past year, did . . . take any courses at business, vocational 
or technical school? 

1. Yes 

2. No 

3. Don't know 



(IF "NO" OR "DON"T KNOW" TO BOTH Q's 16 AND 27, SKIP TO Q.20) 

18. In any school or course attended during the past year, was . . . taught 
In a language other than English? 

1. Yes 

2. No 

3. Don't know 



19. During the past year has a teacher, counselor, or school official said 
that . • . had difficulty speaking or understanding English? 



1. Yes 

2. No 

3. Don* t know 



20. At any time during the past year did . . . take any course or class for 
people whose principal language is not English? 



1. Yes 

2. No 

3. Don't know 



21. What is the highest grade or year of regular school . . . has ever 
attended? (USE FLASH CARD D) 



(IF "NONE" SKIP TO 27. IF "DON'T KNOW," SKIP TO Q.23) 



22. How many years of . . .'s schooling vjas taught in English? 



23. Did . . . speak English before going to school for the very first time? 
1. Yes 

2. No (SKIP TO Q.23) 

\ 3. Don't know (SKIP TO Q.25) 



24. How well did . . . speak English before going to school for the very 
first time? (READ ANSWER CHOICES l'-4) 

!• Very well 4. Just a little 

2. Well 5. Don't know 

3. Adequately 



25. Has . . . ever repeated a grade in school? 
1. Yes 

2. No (SKIP TO Q.2?) 

3. Don't know (SKIP TO Q.27) 



Q 26. Wh^t grade(s) did . . . repeat? 

ERIC 
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27. Does . . . have any difficulty in speaking or understanding English? 
(READ ANSWER CHOICES) — 

!• Yes, difficulty in both speaking and understanding 

2. Yes, difficulty only in speaking 

3. Yes, difficulty only in understanding 

4* Yes, doesn't speak or understand at all 

5. No, no difficulty in speaking or understanding 

6. Don't know 



28. Does • . . prefer to avoid places where only English is spoken? 



1. Yes 

2. No 

3. Don' t know 



29. During the past year has . . . been employed at any tinie? 



1. Yes 

2. No (SKIP TO Q.31) 

3. Don't know (SKIP TO Q.32) 



30A. For whom did . . . work? (NAME OF COMPANY^ BUSINESS^ ORGANIZATION. 
OR OTHER EMPLOYER) 



30B. What kind of business or industry is this? (FOR EXAMPLE^ TV AND RADIO 
MANUFACTURING^ RETAIL SHOE STORE, STATE LABOR DEPARTMENT,. FARhi) 



30C. What kind of work did . . . do? (FOR EXAMPLE, ELECTRICAL ENGINEER. 
STOCK CLERK, TYPIST, FARMER. ) 



30D. What were . . .'s most important activities or duties? (FOR EXAMPLE, 
TYPES, KEEPS ACCOUNT BOOKS, FILES, SELLS CARS, OPERATES PRINTING 
PRESS, FINISHES CONCRETE) 



30E. At work, what language does . . . usually speak? (USE FLASH CARD C) 



31. What is the usual language spoken in this household? (USE FLASH CARD C) 



32. What other languages are spoken in this household? (USE FLASH CARD Cj. 
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m The question of whether to have a separate screening item such as "Does 

• • • . speak or understand an^ English?" or to have the not at all alter- 
native of the "how well" items characterize them was left to NCES. It was found 
that few adults or children (10% and 2% respectively) were recorded as neither 
speaking nor understanding any English, and thus justification as to whether 
question 5 should be retained was left to the designers of the final SIE ques- 
tionnaire. Such a question could be useful more as a device for moving to a new 
topic than for the information it yields by itself. 

5. MELP Questions as Recommended to NCES 

On October 2, 1975 the following questions were recommended to NCES for 
inclusion in the SIE instrument. 
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"How well" questions: 

1^ Does • . • speak or understand any English? 

1. Yes 

2. No 

3. Don't know 

2. How well does . . . understand spoken English? 
1. Very well 

2. All right 

3» Enough to get by 

4. Just a few words 

5. Not at all 

3. How well does . . . speak English? 
1. Very well 

2. All right 

3. Enough to get by 

4. Just a few words 

5. Not at all 
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B. English i^age questions: 

1. What is the usual language spoken in this household? (To 

be asked only once of the household responde'>t; interviewer 
coded for each member of the household.) 

2. What language does . . . usually speak when talking to: 

a, brothers and sisters? (children only) 

b. . . , 's best friend? 

Questions about reading habits: 

1. How often does . . . read an English- language nav7spaper? 
(Adults only) 

1. Often 

2. Occasionally 

3, Not at all 



D. Educational questions 

1. How many years of . . . 's schooling was taught in English? 

II. Questions forv^arded for inclusion in the SIE questionnaire on 
the recommendation of the Language Group Representatives. 

1. How v/ell does « . « understand spoken [principal non- 
English language (from III, 8a and b)]? 

1. Very well 

2. All right 

3. Enough to get by 

4. Just a few vjords 

^ 5. Not at all 
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1. How V7ell does . ♦ • speak [principal non-English language]? 



1. 


Very well 


2. 


All right 


3. 


Enough to get by 


4. 


Just a few words 


5. 


Not at all 



III Non-l^IELP questions: It was our understanding that the following question would 
be asked for reasons other than to categorize individuals as LESA or not: how- 
ever we assumed that they would be available for incorporation into the 
HELP, 

1. What is • . .•s date of birth 

1. Wiat is . • .'s origin or descent ("tribe" if Native 
American)? 

3. In what state, U.S. territory, or foreign country was 

born? 

4. \^en did . • . come to the U.S, inainland to stay? [Skip 

if answer to preceding question v7as "this state" or 
"different state" o ] 

5. How many years of . . .'s schooling was not on the U.S. 

mainland? 

6. VJhat is the highest grade or year of regular school . . . • 

has ever attended? 

7. What other languages are spoken in this household? (to follow 

question Bl) 

8. a. Wliat other languages (besides English) does . . . speak? 
b. l^hich of these languages does • • . speak most often? 
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6. Definitions of the HELP Variables 

Once the MELP questions had been selected for use in the SIE, there remained 
the task of quantifying the responses to them so that they could be entered into 
further statistical analyses to derive one or more "scoring keys". Such scoring 
keys would determine ho\^ any given individual would be actually classified as LESA 
or not on the basis of his MELP responses. The quantified responses to the MELP 
questions will be called the KELP variables > There were ten MELP variables for 
children and 11 for adults. They are defined belo\^7• The labels in capital letters 
will be used to refer to the various MELP variables henceforth. Questionnaire 
numbers refer to those in Figure 1. 

Child MELP Variables 

A. Length of time in U.S. (WHEN): This variable was a composite of 
questionnaire items #3 and #4, and it had three possible values. 

1 - Born outside the U.S. and came to U.S. after 1972 

2 - Born outside the U.S. and came to U.S. before 1973 

3 ~ Born in the U.S 

B. Rating of proficiency in Speaking English (SPEAK): Derived from 
items #4 and #5, and scored on a scale of 1 through 5: 

1 - Does not speak any English at all 

2 - Speaks just a little 

3 - Speaks adequately for a few purposes 

4 - Speaks adequately; adequately for most purposes, or well 

5 - Speaks very well 

Any missing data were given the value of 2. 

C. Rating of proficiency in understanding spoken English (UNDERSTAND): 
Also scored on a 1 to 5 scale using the same scale labels as SPEAK 
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only with the word ' Wderstand" replacing each occurrence of "speak," 
Derived from items # 4 and 6. Any missing uata were given the value 
of 2, 

Usual language spoken in the household (HLANG) : This was a three-valued 
variable derived from item #31. 

1 - not English 

2 - any missing data 

3 - English 

Usual language spoken- with brothers and sisters (SIB): Scored 
exactly as was HLANG. Derived from item # 12a. 

Usual language spoken with best friend (FRIEND): Scored exactly as 
was HLANG. Derived from item # 12 d. 

Number of years of formal education in which English was the language 
of instruction (YEARS). Derived from item # 22. 
Year of birth. (BIRTH). Derived from item # 1. 
Grade in school (GRADE). Derived from item # 21. 

Highest year of formal education attained by the head of the child's 
household. (PARENT). Derived from item # 6 of the Household Infor- 
mation Form (see Appendix 16) . 
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Adult MELP Variables . Most of the MELP variables for adults were identical to 
those used for children as defined above. In particular WHEN, SPEAK, UNDERSTAND, 
FRIEND, HLANG, YEARS, BIRTH, and GRADE were the same, SIB was dropped for adults 
because many adults either did not have living siblings or they talk with them only 
very rarely. PARENT, of course was also dropped. 

Three new variables were added: (a) INCOME was taken from the Household 
Information Form. It asked: **What was the total income of this family during 
the past year? (This includes wages and salaries, net income from business or 
form, pension, dividends, interest, rent, social security payments, and any other 
money income received by members of this family.)" The response alternatives vjeve: 

1. $0 - 4,000 4. $15,000 - 19,000 

2. $5,000 - 9,999 5. $20,000 and over 

3. $10,000 - 14,999 6. Don^t kno\,7. 

(b) NEWS was taken from question #14a of Figure 1. It asked "How often does . 
. . . read an English newspaper?" The alternatives were "Often", "Occasionally", 
and "Not at all", and were scored 1, 2, and 3 respectively. 

(c) KID was taken from question 12e. It asked for the language normally 
spoken with children in the household. "English" was scored as 3, any other lan- 
guage as 1, and no response as 2. 

The treatment of missing data . In any data collection there will be sor.\e protocols 
which have missing or unusable data for some variables. The reasons for missing 
data are many. They include refusal or inability of the respondent to answer the 
question, failure of the interviewer to ask the question or to record the response, 
and errors in the procedures by which the data are transferred from the question- 
naires to computer -readable tapes. For some MELP variables, missing data for an 
individual respondent caused all of the data from that respondent to be dropped 
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from the analysis; hcn.ever, for SPEAK, UNDERSTAND, FRIEND. SIB, and HLAI^G, a value 
vas substituted (see above) if data were missing. In the case of SPEAK and UNDER- 
STAND, missing data vere coded as "2" ("Just a little") since it was a popular 
option, and we assumed that missing data on these items were more likely to occur 
for respondents who were not proficient in English thaa for those who were more 
proficient. For FRIEND, SIB, and HLANG, a middle value was used. 

Missing data were extremely rare for these variables in any case. About 4% 
of the responses to SPEAK and UNDERSTAND were either missing or "don't know", as 
vere about 2% of the responses to FRIEND, SIB, and IILANG. These rates were for 
adults answering about themselves and the Household Respondent answering about a 
child, comparable rates for the Household Respondent answering for another adult 
in the household were slightly higher (see Chapter IX); however these latter, proxy 
data were not used in the derivation of the scoring keys. 



VI ♦ The Criterion Variables 



Major objectives of this study were to select a set of MELP questions and to 
establish concurrent validity for them by comparing responses to them with other 
measures of Limited English-Speaking Ability. The point has already been made that 
although no paucity exists of instruments for assessing English proficiency, there 
is presently no single, widely accepted such measure on which we could rely to 
obtain the "true** categorization (LESA or non-LESA) of each individual in the field 
test. Thus, our position was one of having several different measurement approaches 
to English proficiency all admittedly quite fallible against which to develop 
our MELP* Previous chapters have elaborated on the development of three such 
criterion measures ; school list information, a discrete point test, and a direct 
observation rating procedure (DORP). The discussions in Chapters I and II, indicate 
that these alternatives cannot be ordered among themselves as being "better" or 
**worse" measures of LESA, they are simply different from each other, with different 
strengths and weaknesses. The purpose of this chapter is to define each of these 
measures in detail as used in this study and to present the relationships among them. 

1. The Test 

Chapter III described in detail the development of two discrete point tests, 
one for children (younger than 14) and one for adults (14 and older). This sec- 
tion reports the preliminary statistical analyses performed on those tests. 

To review briefly, each test was composed of three subtests, one of aural 
comprehension, one of oral production, and one of oral communication. The children's 
test was composed of 47 items and 57 possible points. The means and standard 
deviations of the test scores (total points obtained) for each ethnic-linguistic 
group of children were as follo^^s: 



Group 



Sample size 



Mean 



Standard Dev. 





317 


45.0 


16.6 


Chicanes (El Paso) 


364 


42.2 


16.1 


Chinese (S.F.) 


146 


47.5 


12.9 


Other Asian (S.FO 


133 


54.3 


8.6 


Navajo (Arizona) 


260 


52.0 


12.3 


Overall 


loon 


47.0 


15.1 


comparable inf ormat ion 


for adults was : 




\ 


Group 


N 


Mean 


Standar 


Cubans 


272 


18.8 


13.7 


Chicanos 


202 


14.7 


12.8 


Chinese 


111 


24.4 


17.3 


Other Asians 


116 


39.6 


11.2 


Navajos 


214 


39.9 


13.7 


Overall 


915 


26.1 


17.3 
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Although the means order themselves similarly across the groups the range of 
adult means is considerably greater than the range of child means. Generally speak- 
ing, the Spanish speakers scored relatively low on the tests while the Other Asians 
and Navajos scores quite high.* The Chinese shaded an intermediate degree of pro- 
ficiency with the adults having a particularly large amount of wlthin-group varia- 
bility. 

Although these tests were made up of three subtests each, the requirement 
was for a single, global measure of English proficiency rather than three measures. 
TWO alternatives suggested themselves: the first was to simply use the total num- 
ber of points scored on the test as an individual's score and assume that the test 
in fact measured a single dimension interpretable as English proficiency. This was 
what was done in early analyses of the data, including those described in Chapter V. 
The second approach was to empirically explore the dimensionality of the test and 
to construct a unidimensional score for each respondent by weighing the scores of 
the items in differential ways. Tliis avenue was explored throtigh using principal 
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components factor analysis* Factor analysis is a general statistical technique 
which analyzes the co-variation of a number of variables constructed to be mutually 
uncorrelated with each other. In the present application, if the test actually 
measured only a single, uni dimensional construct (i.e English proficiency), a 
single new variable (called a factor) should emerge which was much more prominent 
than the others, and with which most or all of the original test items would be 
highly correlated. To the extent that one or more less important and independent 
factors were found to exist, they would be evidence that the test's total score 
measures more than simply English prof iciency (e .g.^ IQ , chronological age). A 
"purified" (i.e.^ unidimensional) measure of English proficiency could then be con- 
structed by computing a "factor score" for each individual. This factor score is 
computed by adding the item scores after they have been weighted (multiplied) by 

coefficients derived by the principal components procedure. 

The factor analysis was done separately for children and adults, pooling the 

1220 children's test data into a single sample and doing the same for the test 
data of the 915 adults. All computation was done using the SPSS principal compo- 
nents procedures (Nie, et als 1975). 

Children's Analysis ♦ Each item of the children's test was entered as a variable 

in the analysis and principal components were taken of the 47 X 47 inter -item 

product -moment correlation matrix. 

Following the usual convention, as described in the SPSS handbook (Nie, et_ al», 

1975, p. 493), only components (factors) with eigenvalues greater than 1.0 were 

retained. Those eigenvalues are listed below: 

factor eigenvalue percent of variance 

1 16.26 34.6 

2 3.36 7.1 

3 1.34 . 2.8 

4 . 1.19 2.5 

5 1.03 2.2 



* In fact, the children's test was too easy for the Other Asians. It seems likely 
that there was a definite ceiling effect for some members of that group. 
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The magnitudes of the eigenvalues corresponding to the various factors indicate 
their relative importance in terms of variance of the original variables accounted 
for. Together the five factors accounted for 49,3% of the total variance in the 
correlation matrix. These five factors were then rotated using a quartimax pro- 
cedure. The rotated factor matrix is given in Table 1, The entries in this matrix 
are the correlations of the test items \^ith the various factors and are called 
"factor loadings," 
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Table 1: 


Principal Components 
rotation. 


analysis 


of the 


children's 


test data; 


quart imax 




Item 


Subtest 






F2 


11 


n 


11 


hi 


1. 


Compreh 


ens ion 


41 


-10 


42 


-14 


06 


37 


2 






36 


-23 


54 


05 


05 


48 


3 






49 


-04 


04 


-26 


22 


36 


4 






32 


-15 


45 


-12 


17 


38 


5 






43 


-18 


37 


08 


-11 


38 


6 






40 


04 


-04 


-44 


06 


36 


7 






-03 


-07 


09 


54 


71 


80 


8 






41 


05 


22 


-32 


20 


37 


9 






37 


-26 


42 


-18 


-11 


42 


10 






49 


-12 


-02 


-01 


18 


29 


11 


> 




33 


-07 


18 


-16 


-16 


20 


12 


Comprel 


lension 


53 


-13 


08 


-03 


28 


39 


13 


Product 


;ion 


67 


-37 


01 


13 


-03 


61 


- 14 






70 


-13 


-19 


-15 


-01 


57 


15 






71 


-15 


-19 


-08 


-04 


57 


16 






69 


-42 


-07 


15 


-14 


70 


17 






72 


-32 


-07 


11 


-13 


65 


18 






67 


-33 


-03 


12 


-15 


60 


19 






73 


-24 


-07 


-05 


-04 


60 


20 






60 


01 


02 


-22 


04 


41 


21 






65 


-07 


-07 


-14 


20 


49 


22 






73 


-08 


-16 


-14 


01 


59 


23 






68 


-12 


-03 


-11 


12 


50 


24 




V 


70 


-02 


-14 


-04 


05 


51 
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Table 1 continued • 



Item 


Subtest 


n 


F2 


13 


F4 


F5 


£ 


25 


Pro< 


iuction 


67 


-30 


-10 


09 


-08 


57 


26 






71 


-21 


-06 


-01 


09 


56 


11 






66 


-21 


-06 


03 


-00 


49 


28 






65 


-22 


-21 


04 


06 


52 


29 






66 


-15 


-05 


-13 


17 


51 


30 






66 


-04 


-07 


-09 


14 


47 


31 






64 


-30 


-10 


13 


-02 


53 


32 






64 


-33 


-06 


23 


-09 


58 


33 


OC 






48 


37 


09 


-02 


-01 


37 


34 






52 


26 


01 


16 


-00 


37 


35 






53 


29 


-04 


06 


-07 


38 


36 






51 




02 


-01 


04 


37 


37 






57 


34 


08 


13 


-03 


46 


38 






59 


38 


02 


-05 


05 


50 


39 






67 


32 


-00 


16 


-07 


59 


40 






58 


39 


-03 


09 


02 


50 


41 






68 


38 


01 


15 


-14 


65 


42 






64 


39 


01 


05 


-01 


56 








48 




02 


-05 


04 


39 


44 






63 


39 


01 


08 


-04 


56 


45 






58 


36 


-01 


09 


-06 


48 


46 






61 


46 


07 


04 


-06 


59 


47 


1 






65 


42 


03 


07 


-04 


61 



ERIC 



VI - 6 



The last column designated h , contains the sums of the squares of the loadings 
in each row. h can be interpreted a^: the percentage of each variable s variance 
participating in the five factors. That these numbers are relatively low in- 
dicates either that the items had a high degree of singular variation or vere 
relatively unreliable. 

Fl seems to be a general English proficiency factor. It accounts for 
almost five times the variance of the second factor and all but one item loads 
on it with a loading greater than 0.3. Fl seems to be anchored most directly 
by the production items. Thus, it seems clear that it represents the construct 
that we sought to measure. 

F2, accounting for 7.1% of the total variance, is of little substantive 
interest. The product moment correlation of the loadings in the F2 column with 
the difficulties of the items is -0.88; thus, this factor should be considered 
to merely represent item difficulties and be essentially devoid of substantive 
interest. F3 and F4 , representing 2.8 and 2.5 percent of the variance respec- 
tively, seem to involve primarily the comprehension subtest. The six highest 
loadings on F3 are all on items in that test, as are the four highest loadings 
on F4. A more extensive interpretation of these factors is not obvious, F5 
again involves the comprehension items with its primary anchor being item #7 
and little else loading on it. 

Given the highly dominant first factor in this solution along with the 
presence of several minor factors which were apparently either unrelated to the 
content of the test or relatively uninterpretable , the decision was made to use 
each child's score on the lirst factor as his test score. Thus, factor scores 
corresponding to Fl were computed for all children and these were then used in 
all subsequent analyses as representing the children's performances on the test. 
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These factor scores vill be referred to as FCTR; FCTR is scaled vith a mean of 
zero and a standard deviation of one (over the entire sample of scores). 

Adults ' Analysis > The 41 items in the adult test x^ere entered as variables into 
a principal components analysis • As x^ith the child analysis, components vith eigen- 
values greater than 1.0 were retained and rotated using the quart imax procedure. 
There were four such components (factors) and their relative importance can be 
described by the sizes of their respective eigenvalues: 

Factor eigenvalue Percent of total variance 

1 18.7 45.5 

2 2.0 5.0 

3 1.8 4.3 

4 1.2 2.8 

Together, the four factors represented 57.67o of the total variance of the 
41 items. The rotated factor matrix is given in Table 2, together x^ith the means 

of the items and the h"^ corresponding to each item. 

The factor structure has some similarity to the structure found for the child- 
ren's test. In both cases the production test appeared to anchor the first factor 
vhile the comprehension test showed the weakest properties. In the ACT the average 
h^ was lower than in either the APT or the OCT, indicating the likelihood that its 
items were of lower reliability. This conclusion is reinforced by the pattern of 
ACT means. All except one fall betXN^een .43 and .53. This is particularly signifi- 
cant when one considers that the ACT items were all two-choice, and thus would have 
expected means of .50 if all responses were randomly made. Therefore respondents 
did somewhat poorer than chance on the test as a whole. The ACT items appear to 
load both on Fl and F2; ho\^;ever, they load more highly on Fl (mean loading=.40) 
than on F2 (mean loading=.28) . 
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Fl I. clearly interpretablo as a eeneral English proficiency factor. Its 
variance Is over nine tl™es the variance o£ the next «ost important factor, and 
the great .ajorlty of the Ite^ In the test (37 ct of 41, »ere principally Identi- 
fied with it. Therefore, the factor score corresponding to the first factor was 
computed for each respondent, and this score was used In all suhse,nent data analyses 
as that individual's test score. Conceptually, the factor score (referred to, as 
m the children's analysis, as FCTR) can be thought of as a purer measure of the 
central construct under investigation than Is the raw total nu,.ber of points ob- 
tained. However, in this particular case, there was little real choice between the 
t..o measures, since In the total sample they correlated .986 and In no ethnic group 
did they correlate less than .973. As in the case of the child factor scores, the 
adult FCTE scores were standardised over the entire sample with a „=an of .ero and 
a standard deviation of one. 



VI - 9 



► 



Table 2 ! 

Item 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 



/K 



ACT 



/Is 



APT 



imponents 
in to too 


of Adults' Test Data. Quartimax Rotation, 
decimal places, decimal points deleted.) 


(a 




li 


F2 


F3 


F4 


lii 


53 


31 


41 


28 


-03 


35 


46 


20 


35 


37 


22 


35 


53 


46 


25 


23 


18 


36 


48 


32 


01 


-00 


54 


41 


47 


49 


38 


24 


-11 


45 


43 


53 


40 


23 


-10 


51 


44 


46 


26 


34 


-12 


41 


.25 


00 


41 


53 


02 


45 


47 


61 


-00 


-25 


07 


44 


44 


57 


36 


15 


-21 


52 


97 


76 


11 


-13 


15 


63 


106 


80 


08 


-10 


14 


67 


103 


78 


11 


-11 


18 


67 


98 


78 


08 


-14 


23 


69 


109 


81 


07 ■ 


-12 


20 


72 


98 


78 


09 


-11 


18 


66 


96 


76 


07 


-13 


-11 


61 


74 


80 


12 


-23 


-01 


70 


64 










61 


70 


77 


15 


-24 


-27 


75 


72 


75 


15 


-22 


-29 


71 


63 


75 


14 


-31 


-19 


72 


67 


73 


14 


-28 


-26 


69 


90 


77 


09 


-16 


17 


65 
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Table 2 continued 



Item 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 



Subtest 



OCT 



Mean 
93 
93 
51 
45 
47 
45 
49 
42 
57 
48 
57 
52 
30 
48 
49 
47 
50 



H 

80 
79 

66 

63 

65 

67 

67 

69 

73 

66 

77 

74 

59 

72 

74 

75 

75 



F2 
06 
07 
-23 
-20 
-20 
-20 
-25 
-18 
-27 
-26 
-29 
24 
-09 
-28 
-23 
-21 
-27 



F3 
-13 
-15 
18 
11 
11 
15 
16 
10 
18 
16 
20 
16 
12 
12 
14 
13 
17 



F4 
16 
14 
-05 
09 
03 
-01 
-00 
-14 
07 
03 
01 
-04 
-25 
-10 
-04 
-08 
-10 



69 

67 

52 

46 

47 

51 

53 

54 

64 

54 

71 

64 

43 

62 

62 

63 

68 



r 
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2> The School Lists - 

The school list information had two vital strengths relative to its use as a 
HELP variable. 

1. It is very close in definition and purpose to the legislative definition 
of LESA and can (for some school districts) be directly interpreted as the 
lea's way of identifying LESA and non-LESA children, 

2, It is inherently categorical rather than continuous in nature and thus 
provides an excellent guide by which to determine a cut off point on some 
continuous HELP measure (e.g.^ a discriminant or regression function). 

Unfortunately, however, such school information has one large disadvantage: it 
is completely locally defined and it is unlikely that any two LEAs will categorize 
children in just the same way. (This, of course, is a characteristic of the United 
States' decentralized school system.) 

School Lists; Children; The particular school districts from which the present 
samples were drawn were recommended because they had exemplary screening procedures 
and/or curricula for children of non -English language backgrounds; but each used 
its own procedure for determining if a child was to be considered LESA or not. A 
relatively brief sketch of the procedure used by each school is given below: 

A. Dade County Public Schools (Miami): Upon regeris tering for the first 

time in school, each child with a background involving a language other 
than English (as determined informally by the registration clerk) is 
usually interviewed by a specialist in the field of English as a Second 
Language (ESL) . As a result of that interview, the child is categorized 
as non-independent or as independent in English, or, if the results of 
the interview are not clear cut, he is given in additional assessment in 
the form of a test either the Aural Comprehension Test or the Thumb- 
nail Test (both locally developed). An intermediate category contains 
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children vho are not clearly in either the independent or the non-inde- 
pendent categories • Children who are "independent" in English are con- 
sidered to be able to function independently in a monolingual English 
school setting without supplementary materials or instruction in another 
language; this is clearly the concept of being non-LESA as defined legis- 
latively. The categories of "non-independent" and "intermediate" are 
also clearly LESA according to their definitions. Thus, in all analyses 
reported here involving school lists in Dade County, independent children 
were categorized as non-LESA and all others as LESA* 

El Paso: Children were classified as either Spanish Dominant or English 
Dominant based on their relative performances on parallel forms of a 
locally-developed grammar test in English and Spanish. Children scoring 
at the top of both tests or at the bottom of both tests were not on our 
lists at all. Classification was made on the basis of the difference 
between the two test scores. A child scoring higher on the Spanish test 
than on the English test was categorized as "Spanish dominant", while 
a child scoring higher in English than in Spanish was categorized as 
'^English dominant". In the present analyses, "Spanish dominant" was 
equated with LESA and ^^Inglish dominant" was equated with non-LESA. 
While it would have perhaps been better from^the point of view of the 
project to simply use scores on the English test to define the lists, 
this was not how El Paso screened its children, and such scores were not 
available to us in any case. 

Arizona: Navajo children were taken from two school districts, X^indow 
Rock and Ganado. The districts used very different classification 
procedures: 1. Winda^? Rock: Although Window Rock does not routinely 
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classify children on English proficiency, they devised lists for us by 
using scores from comprehension section of the Gates -McGinitie Reading 
test. All those scoring below their grade level v;ere placed on the "lo\>;" 
list. Thus, those scoring beloi<; grade level \'jere interpreted as being 
LESAs in the present analysis while those scoring at or above grade level 
were categorized as non-LESA. Although this categorization procedure was 
initially considered to be marginally relevant to the LESA concept, sub- 
sequent examination of the relationship of the Window Rock lists with 
the other variables in the study led us to discard the information en- 
tirely. Details are given in Appendix 5. 

2. 'Ganado: Ganado relied mainly on teacher ratings, but also used the 
same Thumbnail test (10 completion items) that was used in Miami. Ganado 
had three categories labeled non-Independent , inte rmediate, and Indcpen - 
dervt. Their meanings appeared to be the same as in Miami, and they were 
interpreted the same as were the Miami lists relative to LESA and non-LESA. 
San Francisco: San Francisco's classifications were apparently made by 
the child's teacher after a few weeks of school in the fall. No formal 
assessment procedure was followed. The classification was dichotomous 
with categories labeled limited English , and non-limited English , It 
should be pointed out that these lists were at least 9 months old when 
our data were gathered. All other sites had updated their classifications 
of the children within the three months previous to our data collection. 
It should also be pointed out that all children in the San Francisco sam- 
ple were selected from the rosters of regular elementary schools and not 
from the "Newcomers ' or ^'Education'* centers where many new arrivals spend 
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their first months in the U.S. Thus, it is likely that our San Francisco 
sample did not include some of the most limited children.* That is, all 
children in our group knew at least enough F iglish to be judged able to 
survive in a regular English-language school. 
School List Information; Adults . Since primary emphasis for developing a MELP 
was on children between the ages of 5 and 17 (see Chapter I), field test sites were 
chosen primarily on the basis of availability of school lists for children and only 
secondarily on the basis of school list availability for adults. As a consequence, 
such information was only obtainable for adult samples in Dade County and El Paso 
and not for adults in Arizona and San Francisco. Therefore, lists could be used 
as a criterion variable for adults only in Dade County and El Paso. The defini- 
tions of the samples are as follows : 

School List Information in El Paso - the list information vjhich was available 
for Chicano adults appeared somewhat suspect for reasons that follow: Upon de- 
tailed investigation, CAL discovered that the El Paso lists had not been constructed 
in any direct way from the results of screening procedures. Rather, they repre- 
sented current enrollments of individuals in either beginning or advanced ESL 
classes. Unfortunately, the relation between English proficiency and the level of 
the class in which the respondent was enrolled appeared to be relatively uncertain. 
The selection of a particular ESL class by a potential student was always voluntary. 
Although ESL teachers were available to help people choose the correct class for 
their ability level, many times the choice was determined by convenience of meeting 
times, level of the student's aspirations, etc. Given this situation, it would be 
reasonable to expect that the level of the class in which an individual was enrolled 
would not be highly related to other indices of the individual's English proficiency 
^ -- the MELP questions and FCTR scores in particular. 

-'V We believe that children v.'ho are most limited in English proficiency are not 
FRir '^^^^^^"^'^ to identify with MELP-type questions. It is those children with some 
Lllife!;^ English proficiency whose identification is most problematical. 



The product -moment correlations between El Paso list placement and the MELP 
variables are given belw* and compared with the correlations of the MELP variables 
and FCTR. 

MELP Variable correlated with LIST correlated with FCTR 



WHEN 


.10 


.07 


SPEAK 


.10 


.53 


UNDERSTAND 


.07 


.53 


KID 


.05 


.14 


FRIEND 


.01 


.18 


HLANG 


.04 


.08 


YEARS 


.06 


.39 


NEWS 


-.17 


-.32 


BIRTH 


-.02 


.05 


GRADE 


.03 


.19 


INCOME 


-.08 


.15 


FCTR 


.17 


1.00 


It can be seen 


that the correlations 


of the predictors with FCTR 



than with list in all but one case. The multiple correlation between all eleven 
predictors and list was .23 V7hile it was .65 betx^?een them and FCTll. 

Thus it can be seen that not only was the list information in El Paso not the 
result of a direct screening procedure for English proficiency, but it also was 
not related highly to any other measure of proficiency in our study. On this basis 
list information was discarded for adults in El Paso. 

School List Information in Miami - The situation in Miami was quite different. 
The routine procedure in the Miami adult education program is for each potential 
ESL student to be i.nterviewed by an ESL specialist when enrolled. A preliminary 



^ placement is then made and a follow up interview is conducted three days lato.r to 
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see if the classification \^as accurate. Students are encouraged to take tests to 
help in placing them, but testing is not a required part of the screening procedure. 

The follo^^ing are guidelines for ESL intervicv7ers in making Initial placements 
in Miami: 
Beginning Level 

1. Understands only limited conversation or none at all 

2. Makes errors in using the most frequent grammatical structures 

3. Speaks with significant distortions of words 

4. Uses very limited vocabulary 
Intermediate Level 

1. Understands everyday speech when speakers choose words carefully or 
restate ideas 

2. Makes significant grainraatical errors of interference 

3. Speaks with significant distortion of words 

4. Gropes for words and often has to rephrase to be understood 
Advanced Level 

1. Understands nearly everything a native speaker understands 

2. Uses English with few grammatical errors 

3. Speaks with minor distortions of pronunciation 

4. Uses vocabulary comparable to that of native speakers 

As can readily be seen, the description of the Advanced level clearly im- 
plies no limitation in English while the other txQo imply limitations of varying 
degrees; thus the Beginning and Intermediate levels were designated as LESA and the 
Advanced list as non-LESA. Statistically, this classification scheme was more 
closely related to the MELP predictors and FCTR than was the El Paso class if ication. 
For Miami, the multiple correlation of the eleven MELP variables with list (dicho- 
tomized into LESA - non-LESA) was .48 and the correlation of list with FCTR was 
.51. Tlierefore on both definitional and statistical grounds, tlie decision was made 
to retain the list information in Dade County as a criterion variable. 



.3. The mrP.r nb^orvntlor. Procedure rnn„,^ 

AS dese...e. Capte. . ^^^^^^ ^^^^^^^^^ 

-Xopea . 3e.. 33 . ^^^^^^^^^^^ ^^^^^^^ 

derive and validate the mftp /^-Jr. ^ 

test,. „„,o»u„aeel,. ..e .eveXop... couple., .oo U.e to use ..e 

.ns..»e«a.o„ aU s.es a. . p„pe., aU ,3 a..„- 

istration. As a rpcJ,l^^ -i 

result, relatively complete DORP data were gathered only for the 

Cuban and Chicano groups, and thus the DORP could not hP H 

couid not be used as a full-fledged 
c.Ue..o„ „a..aM. .He .e.,va..o„ „, ..e ^^^^^^^ 

this .eeu„„ ,3 .0 .ep„„ analyses o. „„a. BOHP .ata „„e coXX...e. . 

ies .eXa^XonsHXp .0 ..e o..e. .„o .^...^ SpanUH -3p.a,a„, 

I " .oeH ..e te3. a„. ..e ^..P „a.XaM.3 i„ .He se„3. ,:Ha. .He r,O.P ..p.e3e„.3 a 

I .e.Hoa o. a33e3sx„s E„.X.3. p.oncXenc, „H.cH X3 .ep.e3e„.ea .He .e3. a„. 

; no. directXy represented in .He lists n^K 

, lists. (Ratings by .eacHers or other school per- 

I =o„„eX. on „HieH so™e Xists „ere Hased. eonXd he .Hon.H. o. as Heins so„e„Hat 

I si^xar .0 „OKP ra.in.s.) Moreover, .he does represen. a „e.Hod .or assessing 

I language proaeieney „hieh is accepted as .ace-vaXid Hy „any specialists . 

j TsHXe 3 indicates .He „„.Her of .ORP ra.ings ^de within each ethnic group 



ERIC 



Table 3 

Sample Size DORP Ratin gs 

Cuban children 317 307 

Cuban adults 272 262 

Chicano children 354 ^05 

Chicano adults 202 2^53 

Asian children 279 fis * 
(including Chinese) 
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Table 3 continued 



Site 



Sample Size 



DORP Ratings 



Asian adults 
(including Chinese) 



227 



58 



Navajo children 



260 



61 



Navajo adults 



214 



46 



Since the number of DORP ratings made were relatively few in Arizona and 
San Francisco, they were eliminated entirely from the analyses to be reported below 
and only those involving Spanish speaking xespondents v;Gre used* 

4. Relat ionshij)S Among Criterion Measures 

Table 4 gives the product -moment correlations among test total score, FCTR, 
List, and DORP for the Chicano and Cuban children and for test total, FCTR, and 
List for Chinese, Other Asians, Navajos and all children together* It should be 
remembered that since list is dichotomous, all correlations with List are point 
biserial coefficients and can thus be expected to be lower in magnitude than the 
other coefficients (as indeed they are) • Table 4 shows what we might expect with 
three fallible measures of the same construct: that is, the correlations are 
substantial, but nocsfhere near unity. Table 5, which gives the corresponding corre- 
lations for Cuban adults, yields very similar results. 

An alternate way of looking at the relationship between List and FCTR, our two 
principal criteria, will be given in the last section of this chapter. 
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Table 4: Intercorrelations of Criterion Measures for Children 



A- Cubans (N=307) 

List Test 
List ,43 
Test 
FCTR 
DORP 



B. Chicanos (N=306) 
FCTR DORP List Test 

•37 .46 List .61 

•93 .72 Test 

.66 FCTR 
DORP 



FCTR 

.60 

.93 



DORP 
.55 
.71 
.65 



C. Chinese (N=146) 

List Test FCTR 

List .40 .32 

Test — .86 

FCTR 



D. Other Asians (N=133) 

List Test FCTR 

List .31 .26 

Test --- .72 

FCTR 



E. Navajos (Ganado Only) 

List Test FCTR 
List — .31 .30 

Test .88 
FCTR 



F. Overall (N=1098) 

List Test FCTR 

List .45 .43 

Test .92 

FCTR 



Table 5; Correlations for Cuban Adults : Criterion Measures 
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List 

Test 
FCTR 
DORP 



List 



Test 
.52 
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FCTR 
.49 
.98 



DORP 
.48 
.73 
.72 



1 A 



The question has been raised about the degree to vhicli DORP and FCTR combined 
might make a criterion variable more valid and reliable than either is alone. To 
obtain an idea of that, each child's FCTR and DORP scores were simply added to- 
gether after having been standardized within group. The multiple correlation 
coefficients of these composite variables with the 10 MELP variables were .73 and 
.82 for Cubans and Chicanos respectively. The corresponding multiple correlations 
for FCTR alone are .67 and .73 respectively. Thus, within these groups , the use 
of FCTR and DORP in combination might have been expected to control about 107o more 
of the variance of the ^^ELP, Had complete DORP data been obtained for all children 
such a combination v7ould have been employed, resulting in somewhat better perform- 
ance figures for the scoring keys derived in Chapter VII. Whether the better per- 
formance would have been due simply to greater reliability or also to greater valid- 
ity of the criterion variable it is impossible to say. 

With respect to adults, the situation was slightly different. Again, FCTR and 
DORP scores were both standardized within group and then added together for each 
individual. As in the above analysis, this composite variable was then used as the 
criterion in a multiple regression analysis with the MELP variables as predictors. 
The multiple correlation coefficients were .70 and .64 for the Cuban and Chicano 
groups respectively. They compare with .69 and .65 respectively when FCTR is used 
alone. This indicates that for adults little if any additional performance would 
be gained by a MELP if it were derived using a combination of FCTR and DORP as a 
criterion. Certainaly, DORP ratings alone would not seem to be superior to FCTR 
as a criterion except possibly on the basis of face validity alone. 

Dichotomizing FCTR . Because the objective of this study was to develop a measure 
of a dichotomous characteristic, it was necessary to convert FCTR from a continuous 
variable into a dichotomous one before it could usefully serve as a criterion 
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measure in the derivation of a scoring key for the MELP. This amounted to defining 
a cutting point on the FCTR scale such that all children having scores beloxvf that 
value would be considered LESA — as far as tost results were concerned and all 
children scoring at or above that value would be considered non-LESA. But how 
could that cutting point be determined in a non -arbitrary way? Since norms had not 
previously been computed for this test, there was no way to interpret what a given 
score meant relative to any known group distributions. Neither was the test con- 
structed to be criterion -ref erenced , so inspection of the contents of the items 
did not help to determine what score ranges might be called LESA and non-LESA respec- 
tively. The only link from the test to a dichotomy was the fact that the respondents 
had taken the test and had been classified LESA or non-LESA by schools. The solu- 
tion employed, then, was to assume that the schools had given us the correct number 
of children who were LESA in the sample, even if they had not been correct in their 
categorization of every individual child. (This is equivalent to assuming that 
the schools made as many false positive diagnoses of LESA as they did false 
nagatives.) The cutting point on the test was then determined by placing it such 
that the same number of children (approximately) were characterized as being LESA 
by the test as by List. For example, among Cubans, there were 210 children on the 
LESA school lists out of 317 children. The FCTR cutting point was chosen for Cubans, 
then, so that the 210 children who scored lcrcN;est on FCTR were LESA and the highest 
107 were non-LESA* That cutting point was +.45 on the FCTR scale (or approximately 
54 in terms of total test points). This procedure was carried out for each group 
individually and for the entire sample of 1098 as a whole. The FCTR cutting points 
are given in Tables 6b, 7b, 8b, 9b, 10b, and lib and ranged from .18 for Chicanos 
to .63 for Navajos. One way to interpret this range is to ascribe it to differences 
in the criteria which the schools implicitly or explicitly used in making their 
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classifications. It may be that a child knwing just enough English to score .30 
on FCTR would be assigned to the "English dominant'' group in El Paso but to the 
"limited" group in San Francisco or the "non-independent/intermediate" group in 
Miami or Ganado. Another interpretation of the differences is that there \^as a 
test-culture interaction. Under this interpretation, Chicano children scored 
systematically letter on the test than did, say, Navajo children even though they 
had the same English proficiency presumably because the test discriminated against 
Chicanes in non-linguistic ways. Although it is not possible to dismiss the latter 
possibility, precautions against it were taken by having representatives from all 
the ethnic groups criticize the test in detail and suggest alternative, more 
"culture-fair" forms. 

It should be noted that there are other possible approaches which could be used 
in dichotomizing FCTR. One would be to determine a cutting point by examining the 
contents of the various test items and deciding, in consultation with teachers or 
other specialists, what mimimum performance would be necessary to consider a person 
as being LESA. Another would be to choose the cutting point which would minimize 
the number of individuals for which classification by list and by FCTR disagreed. 
The former method was not pursued because of the difficulties in arriving ration- 
ally at such a cutting point in a non-arbitrary way. The latter method was explored 
and found to yield results very similar to those of the procedure which was employed. 

Adults While the same logic was used in dichotomizing FCTR for adults as V7as used 
for children, the procedure was only possible for the Cuban group since that was 
the only group for which useful list classifications were available. Thus, a cutting 
point was established only for Cubans and then simply assumed to be valid for the 
other groups. The cutting point arrived at was +0.1, corresponding to a total test 
score of approximately 29. When the cutting point of +0.1 was applied to each of 
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the adult samples , the f ollwing numbers and proportions of individuals fell into 
the LESA and non-LESA categories: 

Overall Cubans Chicanos 



LESA 
non-LESA 
Total 



N 


Prop. 


N 


Prop. 


444 


.49 


185 


.68 


471 


.51 


87 


.32 


915 


1.00 


272 


1.00 



N Prop. 
160 .79 

42 .21 
202 1.00 



Chinese 


Other 


■ Asians 


Nava i o 


N Prop. 


N 


Prop. 


N 


Prop. 


56 .50 


17 


.15 


26 


.12 


55 .50 


99 


.85 


188 


.88 


111 1.00 


116 


1.00 


214 


1.00 



Although the overall proportions of LESA and non-LESA individuals are approx- 
imately equal within the sample of all adults taken as a whole, the proportions 
within the ethnic group vary widely - from 79% LESAs among Chicanos to 12% among 
Navajos. Therefore, it must be kept in mind that in the present study 78% of all 
LESAs were Spanish speakers and only 27% of all non-LESAs were Spanish speakers. 



5. The Correspondence between List and Dich otomized FCTR 

Since both List and FCTR will be used in subsequent chapters as criteria 
against which to derive scoring keys for the MELP variables, it is important to 
explore the degree of agreement be&.een these two measures themselves. If they are 
highly redundant with each other, then it is likely that a given MELP scoring key 
will yield LESA - non-LESA categorizations which will agree with both criteria or 
with neither. However, to the extent that the two criteria are theiT>selves not high- 
ly correlated, then the possibilities become more complex. The MELP might be more 
highly in agreement with one criterion than with the other or it might be moderately 
correlated with both. Given two relatively uncorrelated criteria, a moderate 
correlation with both would seem preferable since we have already taken the position 
that the two criteria represent different ways of indexing the LESA - non-LESA 
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distinction and that there is no consensus that one is "better*' than the other. 
Since the point biserial correlations already reported betv7een List and FCTR were 
relatively low (.43 for all children pooled and .48 for Cuban adults), we can expect 
that the correspondence betxv'een dichotomized FCTR and List will not be particularly 
high either. 

Table 6 gives the four-fold tables of classification for children in each 
ethnic-linguistic group. The frequencies in the upper-left and lower-right cells 
of each table represent individuals for whom list classification and dichotomized 
FCTR classification agreed, while the frequencies in the lo^^er-left and upper - 
right cells represent disagreements between the two systems. agreement" is 

the sum of the agreements over the total number of individuals in the Table. An 
inspection of these numbers immediately confirms our expectations, that the degree 
of association between these two measures, although substantial, is not as high as 
would be desired for alternative criteria to be used in the derivation of a single 
measure. Also, the agreement is substantially higher for the two Spanish speaking 
populations than for the other groups. These considerations must be kept in ind 
throughout the presentations in Chapters VII and VIII. 



Table 6 : Agreement between dichotomized FCTR and School List, 
A. Cubans B. Chicanos 



List 



FCTR 

pt. 

45) 



Lis 





LESA 


non-LESA 


Total 


LESA 


166 


43 


209 


non-LESA 


44 


64 


108 


Total 


210 


107 


317 



LESA 
LESA 161 



FCTR 

[cut pt, 
=.18) 



Total 191 



non-LESA 
29 



144 



173 



737o agreement 



847o agreement 



Total 
190 

174 
364 
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Table 6 continued 
^ C. Chinese 



D. Other Asian 



List 



List 





LESA 


non-LESA 


Total 


LESA 


non-LESA 


Total 


LESA 


67 


25 


92 


LESA 29 


29 


58 


FCTR 








FCTR 






(cut pt. 








(cut pt. 






=.41) 








=.54) 






non-LESA 


26 


28 


54 


non-LESA 24 


51 


75 


Total 


93 


53 


146 


Total 53 


80 


133 




657o Agreement 




607o Agreement 




E • Nava JOS 


(Ganado only) 




F. All Children 








List . 




List 






LESA 


non-LESA 


Total 


LESA 


non-LESA 


Total 


LESA 


69 


25 


94 


LESA 487 


153 


640 


FCTR 








FCTR 






(cut pt. 








(cut pt. 






=.63) 








=.43) 






non-LESA 


26 


18 


44 


non-LESA 155 


303 


458 


Total 


95 


43 


138 


Total 642 


456 


1098 



637o Agreement 



727o Agreement 



FRir 
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VII: derivation of LESA Categ orization Procedures for Children 

In Chapter V, a set of ten HELP variables were defined ..hich vere the quan- 
tified responses to the MELP questions. Ho^.ever , for any given individual these 
ten variables were a long way from a single categorization as being either LESA or 
not. The subject of this chapter is the development of "scoring keys" by which 
a child can be assigned to the category LESA or not on the basis of his or her 
values on the MELP variables. Two approaches were taken. The first was to use 
discriminant analysis. This procedure combines a set of di scriminating variables 
(the MELP variables) in a linear, discriminant function such that the resulting 
composite variable maximally discriminates between the two values of a dichotomous 
■ criterion variable (either list or FCTR) . Tl.e discriminant analysis procedure 
derives the discriminant function - which includes a weighting coefficient for 
each predictor variable - in such a way that the total nun.ber of categorization 
agreements between the discriminant function and the criterion variable is max- 
imized, conversely, the total number of "errors" of classification made by the 
discriminant function relative to the criterion are odnimize^ The second approach 
to a scoring key was si.ply to postulate explicit operational definitions of the 
LESA and non-LESA categories in terms of the ^^^,LP variables and then test the agree- 
ment of these definitions against the LESA and non-LESA categories as defined by 
one or another of the criterion variables. Each of these approaches will be explored 
in turn. 

1. The Evaluation nf MELP-Based Definitio ns of LESA and non^SA^ 

categorization procedure based on the MELP variables, be it a discriminant 
function or simply an ad hoc definition, when compared with the categorization of 

rthc same respondents by either List or FCTR-, yields a four-fold table which 
O ri7this chapter, "F^^ways means "dichotomized FCTR." See Chapter VI for 

ERjC details . 



VTT - 1 



characterizes the amount of correspondence bet\i?een the t\^o systems. Such four- 
fold tables and statistics derived from them x^?ill form the basis of our evaluations 
and comparisons of various possible scoring keys. Consider Table 1 belo\%7: 



Table 1: 



KELP-based 
Categorization 



Criterion Categorization (assumed correct) 



LESA 


non-LESA 


Total 


A 


B 


A+B 


C 


D 


C+D 


A+C 


B +D 


A+B+C+D 



Such a table compares categorization by discriminant function with categorization 
by criterion. If A^B^C^D represent the frequencies in the above cells, A and D 
represent those in the total sample vhich are categorized the same by both the 
criterion and the discriminant function. Clearly, the larg'^'.r A -[- D, the more 
effective is the discriminant function in predicting the "correct" categorizations 
of the individuals in the sample. On the other hand, for the purposes of this 
study, the crucial objective of a scoring key is to correctly estimate the pro- 
portion of LESAs in a population. This is not necessarily the same as minimizing 
the total number of errors of classification. To achieve the former objective, 
the frequencies in cells B and C must be roughly equivalent to each other or 
balanced, while to achieve the latter, B I C is minimized « Thus, it is not neces- 
sarily the case that a discriminant function v;ill produce the same marginal fre- 
quencies (i,e.j A I B and C -f D) as the criterion categorizations (A -j- C and B -j- D) 
even for the data set from which it was derived. 
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In evaluating the performance of any scoring key, two kinds of indices are 
3.mportant: one which measures the accuracy of the scoring key in terms of pro- 
portion categorized the same by scoring key and by criterion measure* This will 
be referred to as categorized the same by criterion and MELP,'' It vull equal 
(A f C)/(A f B I C f D) • The other measure is of the agreement between the pro- 
portions identified as LESA by the criterion and by the scoring key. It is the 
difference between the two proportions ^'Ivided by the latter proportion. In terms 
of Table 1, it is (B - C)/(A f C) and will be denoted as bias"- Negative values 
indicate that the scoring key underestimates the number of LESAs while positive 
values indicate overes timat ion. 



2. Discriminant Analyses; Child Data 

T^70 discriminant analyses were performed on the data from each ethnic group, 
one using school list as the criterion and the other using FCTR. Such analyses 
were done separately for each of the five ethnic groups and also for all groups 
pooled into a single sample. In all cases the same ten HELP variables were used 
as discriminators. All analyses were done using the SPSS system. 

Table 2 gives the overall accuracy of classification of each discriminant 
function relative to its particular population and its particular criterion. 
Accuracy is expressed both as the percent of the group classified in the same cate- 
gory by both the discriminant function (MELP) and the criterion and in terms of 
the disparity between the proportions classified as LESA by both (7obias). 
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Tables 3-8 give the actual cross -tabulations of classifications by each 
procedure (criterion vs • MilLP) for each group. Percentages in each cell represent 
percent of the column. For example, in table 3a, 497 children were categorized 
LESA by both List and MELP, 145 \^?ere categorized LESA by List and non-LESA by >IELP, 
etc. Of the 642 categorized LESA by List, 497 of them constitute 77% while 145 
make up the remaining 23%. Tables 9 and 10 give the discriminant functions used 
in the MELP categorizations of Tables 3-8. The functions in Table 9 define the 
MELPs used in Tables 3a, 4a, 5a, 6a, 7a, and 8a while those in Table 10 define the 
MELPs in Tables 3b, 4b, 5b, 6b, 7b, and 8b. 

It is clear from Table 2 that while List and FCTR are different from each 
other (see Chapter V), the MELP variables predict to each vjith relatively equal 
accuracy. 
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TABLE 2 ; Performance of discriminant functions derived within each group and across 
to classify the same group* Children's data. 



7, classified the same 
by Criterion and MELP 

Vo classified LESA by 
Criterion 

7o classified LESA 
by MELP 



7c Bias 



15: 



Overall 
LIST FCTR 



77 78 



58 



55 56 



-5 -4 



Cubans 
LIST FCTR 



78 



66 



58 



75 



66 



58 



-12 -13 



Chicanes 
LIST FCTR 



87 



52 



57 



85 



52 



54 



Chinese 
LIST FCTR 



75 



55 



-I- 10 +4 -13 



73 



64 64 



58 



Other As 
LIST 



73 



40 



38 
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Table 3: Overall Sample: Accuracy of overall discriminant functions. 



A* LIST as criterion 
LIST 



B. FCTR as criterion 

FCTR (cut=--,43) 





LESA 


-LESA 


Total LESA 


-LESA 


Total 


LESA 


77% 


24% 


LESA 79% 


24% 






497 


111 


608 504 


108 


610 


MELP 






MELP 






(Discr. funct.) 
-LESA 


23% 


76% 


(Discr. funct.) 

-LESA 21% 


76% 






145 


345 


490 136 


350 


488 


Total 


642 


456 


1098 640 


458 


1098 



Table 4 ; Cubans: Accuracy of Cuban discriminant functions. 



A. LIST as criterion 
LIST 



LESA 



MELP 



-LESA 



B. FCTR as criterion 

FCTR (cut=,45) 







77% 


21% 


162 


23 


) 

23% 


79% 


48 


84 


210 


107 



Total 



185 



LESA 
LESA 75% 
156 



MELP 



(Discr. funct.) 

-LESA 25% 



132 



317 Total 



53 



209 



-LESA 

25% 

27 



75% 
81 
108 



Total 



183 



.134 



317 
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Table 5 : Chicanes : Accuracy of Chicaiio discriminant functions 



A. List as Criterion 



FCTR as Criterion (cut = .18) 



LIST 



FCTR 





LESA 


Non-LESA 


Total 




LESA 


Non- 


LESA 


87% 


18% 




LESA 


87% 


18% 




166 


30 


196 




166 


31 


MELF 








MELP 






(Discr. funct.) 

Non-LESA 


137o 


82% 




(Discr. funct 
Non-LESA 


) 

13% 


82% 




25 


143 


168 




24 


143 


Total 


191 


173 


364 




190 


174 



Total 



197 



167 
364 



Table 6 : 


Chinese: 


Accuracy of 


Chinese 


discriminant functions 




A. List 


as criterion 






B. FCTR as criterion (cut 


= .41) 




L] 


ST 






Fc: 








LESA 




Non-LESA 


Total LESA 


Non- 


-LESA 


LESA 


74% 




23% 




LESA 75% 


30% 




MELP 


69 




12 


81 


MELP 6^ 


16 




(Discr, funct. 
Non-LESA 


) 

26% 




77% 




(Discr. funct.) 
Non-LESA 25% 


70% 






24 




41 


65 


23 


38 




Total 


93 




53 


146 


92 


54 





Total 



85 



61 



146 
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Table 7: Other Asians: Accuracy of Other Asians discriminant functions, 



A. List as criterion 

LfST 

LESA 
LESA 647o 
34 



B. 



MELP 

(d.f.) 

-LESA 367o 

19 

Total 53 



FCTR as criterion (cut =.54) 
FCTR 



•LESA 


Total 




LESA 


-LESA 


Total 


21% 




LESA 


687o 


287o 




17 


51 


MELP 


36 


17 


53 


797o 




(d.f.) 

-LESA 


327, 


12% 




63 


82 




22 


58 


80 


80 


133 


Total 


58 


75 


133 



Table 8: Navajc 



/aios: Accuracy of Navajo discriminant functions (Ganado only) 



A. List as criterion 

LIST 





LESA 


-LESA 


LESA 


697, 


287o 




66 


12 


MELP 






(d.f.) 

-LESA 


317„ 


727. 




29 


31 


Total 


95 


43 



Total 



78 



60 



138 



FCTR as criterion 


(cut =.6 3) 




FCTR 




LESA 


-LESA 


LESA 


78% 


18% 




73 


8 


>IELP 






(d.f.) 






-LESA 


22% 






21 


36 


Total 


94 


44 



Total 



81 



57 



138 
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Scoring Key 

For each of the tv;o types of discriminant analyses discussed above six separate 
scoring keys were derived: one for each of the five ethnic groups and a sixth for 
all the groups combined. Each scoring key \^as simply a linear equation vith the 
terms being the MELP variables and the coefficients being the uns tandardized co- 
efficients given in Tables 9 and 10. Such equations yield a single value for each 
individual. If that value is above the cutting point (see Tables 9 and 10), the 
individual is in one category j if it is belo\^ the cutting point he is in the other. 
For example, consider the discriminant function for the Cuban children. It is: 
Y=- . 14'n^HEN- . 39''^SPEAK- . 02^>UNDERSTAND- . 04-'^ IB- . lO^^FRIEND .24'"-HLANG- . 31-YEARS - . 03-^^ 
BIRTH- . 06 'VGRADE- . 04'VpARENT -^5 . 07 

For any Cuban child, if Y is less than -.19, then he or she is categorized as non- 
LESA. If Y is equal to or greater than -.19, then he or she is LESA. 

The five keys for the specific ethnic groups could be used by Census to clas- 
sify the SIE respondents who are members of these five specific groups as LESA or 
non-LESA* However, there are many other ethnic groups which were not sampled in 
this field work. What scoring key should be used to classify these respondents as 
LESA or non-LESA? One possible scoring key is that derived from the combined data. 

To check the accuracy of such a procedure relative to each ethnic group for 
which data were available, the discriminant functions derived from the combined 
groups were applied to each respondent's MELP variables to categorize that 
individual as either LESA or not and these categorizations were compared to the 
criterion categorizations of both List and FCTR. The results are presented in 
Tables 11 - 16. Comparing Table 11 with Table 2, it can be seen that between VL 
and 4% of accuracy is lost in each group, on the average, when a discriminant func- 
tion is used which was derived from all 1098 respondents (as opposed to using a 



VII - 11 



o.o.aU .is.i»inan. ..nc.ion y.oUs a. ave.ase a.solo.. pe.ce.t Ma. o. 18 .o ^ 

S1„ fl^ura tor the locally derived discriminant functions is 
207 while the comparable £igaio ror cue 

a.:, o. an e...c .cup e...c s^oup .as.s, us.n. a s..Xe 

....an. .nc.on .o ca.e.o.. aXX .es.Uea . an ave.a. decease o. 

nndents categorized the same by mLV and Criterion ano 
to 47„ ix. the number of respondents categori 

of 67 in the error of prediction of the proportion of LESAs . 
an average increase of 6/0 m tne err 

-.11 discriminant function on each ethnic group: 
Table 11: Performance of overall disci iminai 



7, classified the same 
by Criterion and tlELP 

7, classified LESA by 
Criterion (from 
table 2) 

% classified LESA by 
MELP 

7, Bias 



Childrens Data 

Cuban Chicano Chines^ f^^T^^f 
Li^n^TR List FCTR List FCTR List FC.R 



75 75 85 82 



66 66 



72 
+ 9 



64 
-2 



52 



53 60 
+ 1 +16 



74 71 68 72 



52 64 64 40 40 



57 66 
-11 +4 



20 
-50 



24 
-40 



Navajo 
List FCTR 

69 67 



69 68 

57 43 
-18 -38 



ERIC 
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Table 12: 


Cubans : 


Accuracy 


of overall discriminant 


functions . 






Lis 


5t 




FCTR 






LESA 


-LESA 


Total 


LESA 


-LESA 


Total 


LESA 


85% 


46% 


LESA 


oU/o 








179 


49 


228 


167 


38 


205 


MELP 






>fF,TiP 








-LESA 


15% 


54% 


-LESA 


20% 


65% 






31 


58 


89 


42 


70 


112 


Total 


210 


107 


317 Total 


209 


108 


317 



Table 13 : Chicanos : Accuracy of overall discriminant functions. 

List FCTR 

LESA -LESA 



MELP 





LESA 


-LESA 


Total 


LESA 


86% 


16% 






165 


27 


192 


-LESA 


14% 


84% 






26 


146 


172 


Total 


191 


173 


364 



LESA 91% 
173 



MELP 



-LESA 9% 
17 

Total 190 



27% 
47 



73% 
127 
174 



Total 



220 



144 



364 



Table 14 : Chinese: Accuracy of overall discriminant functions. 



List 



FCTR 



r 



LESA 
LESA 74% 
69 



MELP 



-LESA 26% 
24 

Total 93 



FRir 



-LESA 
26% 
14 



74% 

39 

53 



Total 



83 



63 



146 



MELP 





LESA 


-LESA 


Total 


LESA 


79% 


44% 






73 


24 


97 


-LESA 


21% 


56% 






19 


30 


49 


Total 


92 


54 


146 
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Table 15: Other Asians: Accuracy of overall discriminant functions* 



List 



1-lELP 





LESA 


-LESA 


LESA 


367o 


107o 




19 


8 


-LESA 


64% 


907o 




34 


72 


Total 


53 


80 



27 



106 



133 



FCTR 



MELP 





LESA 


-LESA 


Total 


LESA 


A CO/ 

45 /o 


iO/o 






24 


8 


32 


-LESA 


55% 


90% 






29 


72 


101 


Total 


53 


80 


133 



Table 16 : Navajos (Ganado only): Accuracy of overall discriminant functions, 

List FCTR 



MELP 





LESA 


-LESA 


Total 


LESA 


68% 


30% 






65 


13 


78 


-LESA 


32% 


70% 






30 


30 


60 


Total 


95 


43 


138 



MELP 





LESA 


-LESA 


Total 


LESA 


57% 


11% 






54 


5 


59 


-LESA 


43% 


89% 






40 


39 


79 


Total 


94 


44 


138 
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3* Contingency Table Analysis and the Derivation of Explicit Operational Defini - 
tions of LESA and non-LESA, 

T\<jo sorts of problems attend any attempt to use discriminant analysis to pro- 
duce a scoring key in the present project • The first is that it \<jas impossible 
to satisy the statistical assumptions of this sort of analysis. Two such assump- 
tions are that the predictor variables are measured in an error-free way and that 
they are continuous. The second problem is in the nature of the scoring key pro- 
duced by such methods. It is a linear equation which adds all predictor variables 
in a weighted fashion into a single, continuous composite variable with a cut off 
point to define the categories LESA and non-LESA. Such a scoring key is totally 
baffling to someone not familiar with multivariate analysis and not readily inter- 
pretable even to those v7ho are familiar with it. One of the common questions 
asked by people attempting to understand how the MELP works is *Vhat patterns of 
answers to the questions identify a person as a LESA?" That is a fair question, 
but quite unanswe ible within the regression-discriminant analysis context. This 
section describes the derivation of a scoring key that provides a ready answer to 
the question. It seeks to enumerate exactly th- sc response patterns (to the MELP 
questions) defining the LESA category and those defining the non-LESA category. 
The analysis consisted of two steps: the first involved reducing the number of 
possible response patterns of the 10 MELP variables to a workable number (from the 
over 30,000 possible patterns implied by the definitions in Chapter V); and the 
second was to display the data in appropriately detailed contingency tables so that 
the effectiveness of various definitions of LESA and non-LESA could be determined. 

Reduction of the number of predictor variables 

Three strategies were used in reducing the number of possible response alter- 
natives to a manageable size: elimination of relatively redundcnt predictors, 
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• mhV variable, and viewing 
•Kir. values of a given tffiU va 
. „r tUc number of possible value 
reduction of luc 

' ""cutri.-- - ...e.o. .... 

^3 a first step, consiaer the 

n-i children (.laDie 
. irrrR as computed across all 
with List, and FCIR, as 

t- correlations; All Sites^'^" 
Table 17: Product -Moment Correl 



1. 




2. 


SPEAK 


3. 


UNDERSTAND 


4. 


SIB 


5. 


FRIEND 


6. 


ULANG 


7. 


YEARS 


8. 


BIRTH 


9. 


GRADE 


10. 


PARENT 


11. 


LIST^'- 


12. 


FCTR 



2 
20 



3 

19 
83 



4 

25 
48 
44 



5 

14 
46 
45 
51 



6 

36 
45 
42 
62 
37 



7 
-06 
32 
28 
16 
17 
14 



22 
-17 
15 
-00 
-04 
-00 
-68 



9 
-24 
14 
13 
01 
04 
-02 
68 
-82 



10 
-00 
30 
29 
32 
29 
32 
03 
10 
-06 



11 
22 
50 
46 
45 
38 
46 
19 
03 
03 
27 



12 
14 
55 
53 
42 
47 
37 
42 
-28 
26 
23 
42 



criteria »o.ld be early can- 

Vari^Us «-i.e.. ^^^^^j:^ 3.™. a. 

,..ates for elimination. Sue ^^^^^^^^^^ ^^^^^ ,,,3, ea„- 

MS«V reaun.e„t variables .ere P. ^^^^ ^^^^^ .^^^^^^ ,„3,., 

.i.ate3 .or oo.inatio„ or .or tbe el.^at. ^^^^ ^^^^^^^ ^^^^^^^ 
.iscar.e. because t.e variables »er ^^^^^^^^^ 

e.a»inins tbe crosstabulatron oi S ^ ^^^^^^ ^^^^^^^ ^ 

«3 .eciae. to si.,1. - tbe var. ^^^^^ ^^^^^^ ^^^^ 

„£ £rom 2 to 10 «hicl> «3 called S 

patterns to nine. 



A second composite variable was formed by combining the three variables based 
on domains of language use -- IILANG, SIB, and FRIEND. The crosstabulation of the 
three variables (reproduced bela^ indicates that they form a three- item Guttman 
scale (Guttman, 1944). 



HIANG=English 
SIB 



HLANG=not English 
SIB 



English 
English 333 



FRIEND 



not English 20 



not English 
33 



English 
English 177 
FRIEND 



10 



not English 257 



not English 
26 



375 



The perfect scale types are: 
Type 0 Type 1 



Type 2 



Type 3 

lllANG^ilnglish 
SIB=£nglish 
FRIEND=English 
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HIANG=not English HlANG=«ot English mANG^M English 

SIB=not English SIB^not English SIB=£nglish 

FRIEND^^iot English FRIEND=£nglish FRIEND =£nglish 

94% of all responses were one of these perfect scale types. On the basis 

of this analysis, the four-position scale USE was defined ^s the number of responses 

of "English" given by a respondent to ICLANG, SIB, and FRIEND. This reduced 27 

possible response patterns to four with very little loss of information. 

Finally, WHEN, BIRTH, GRADE, and PARENT were eliminated from the battery of 

predictors on the basis of relatively low correlations with the criteria and low 

beta-weights in the multiple regression analysis, (see Appendix 6) This, then, left 

three predictor variables: SPUND, USE and YEARS with a total of 9X4X9 or 324 

possible response patterns. To further reduce this number, YEARS was treated as 

having 5 alternatives: 0 or 1, 2, 3, 4, and 5 or more,' This resulted in 180 
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possible response patterns. The product -moment correlations among these variables 
and with List and FCTR are given in Table 18. Table 19 presents the number of 
respondents \<f±th. each possible combination of SPUND, USE. and YEARS values and the 
percent of them who were categorized as LESA by List. (For example, in Tkble 20, 
there were children who had SPUND values in a 2 to 7 range and had a USE value of 
zero and a YEARS value of zero or one, and 937o of them were LESA as determined by 
List.) 

Table 18 : Product -Moment Correlations: All Sites- 

2 3 4 5 

1. SPUND 58 31 50 57 

2. USE 20 53 51 



3. YEARS 



19 42 



4. LIST'V ^2 

5. FCTR 



* See foot note for Table 17, 



Cochran and Hopkins (1961) give an algorithm for labeling each cell of such 

a matrix as being either a LESA cell or a non-LESA cell so as to maximize the total 

number of correct categorizations. Let p equal the proportion of LESA individuals 

642 

in the entire population of respondents in this case p= 1098 or .58. Tlien, 
if the proportion of LESAs in any given cell equals or exceeds that number, the 
cell is labeled as a LESA response pattern. 
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Tabic 19 ; Percent LESA children by List for each combination of SPUND, USE and 
YEARS. ( ) denotes n in cell. 



YEAIIS= 0 or 1 
USE 



QPTTWn 
D x:Ui\JL> 


0 


1 


2 


3 






.92 (52) 


.76 (17) 


.50 (4) 


Q 
O 


• /J> ) 


81 C42") 


.54 (24) 


.16 (19) 


Q 




1.00 (7) 


.42 (12) 


.0 (14) 


10 


0 (1) 




.22 (9) 


.06 (47) 






YEARS = 2 










USE 








0 


1 


2 


3 


9-7 


.98 (63) 


.72 (25) 


.63 (8) 


.50 (2) 


R 
\j 


.78 (18) 


.79 (29) 


.38 (16) 


.16 (19) 


Q 


0 (2) 


.50 (6) 


0 (1) 


.33 (9) 


10 


i • UU J. ^ 




.40 (10) 


.02 (41) 






YEARS = 


3 








USE 






SPUND 


0 


1 


2 


3 


2 - 7 


1.00 (19) 


.68 (19) 


.67 (15) 


1.00 (1) 


8 


.56 (16) 


.60 (16) 


.70 (20) 


^25 (8) 


9 


.50 (2) 


.67 (6) 


1.00 (1) 


0 (4) 


10 


.50 (2) 


.33 (3) 


.38 (8) 


.28 (18) 




Table. 19 continued. 



SPUND 
2 - 7 

8 

9 

10 



SPUND 
2 - 7 

8 

9 

10 



0 

.88 (8) 
.76 (17) 
ND 

.50 (6) 



0 

.83 (12) 
.60 (10) 
1.00 (2) 
.40 (5) 



YEARS= 4 
USE 

1 2 
1.00 (4) 1.00 (2) 
.38 (13) jA^ (13) 



0 (2) 
.60 (10) 

YEARS > 4 
USE 

1 

.75 (4) 
.20 (15) 
.50 (6) 
.33 (15) 



j21 (8) 
.22 (9) 



2 

.09 (11) 
0 (3) 
.33 (12) 



3 

ND 

.29 (7) 
^ (6) 
.13 (15) 



3 

ND 

.50 (8) 
.33 (3) 
.10 (21) 



ND= No data in cell 
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All cells having a proportion of LESAs less than .58 are labeled "non-LESA". 
In Table 19 all non-LESA cells are underlined. Taking the resulting sets of LESA 
and non-LESA cells and using them as a scoring key, the follo\7ing agreement V7ith 
list categorization is obtained. 

List 





LESA 


-LESA 


Total 


LESA 


81% 


24% 




I-IELP 


523 


109 


632 


(p=.58) 

-LESA 


197„ 


76% 






119 


347 


466 


Total 


642 


456 


1098 



% categorized the same by List and MELP = 79% 
% categorized LESA by List = 58.5% 
7o categorized LESA by MELP = 57. 6% 
7o Bias = -27o 

While this compares very \<jell with the performance of tne scoring keys derived 
by discriminant analysis, it is not a face valid definition of LESA and non-LESA. 
The pattern of non-LESA cells in Table 19 is somewhat irregular, with, for example, 
several cells having USE=zero being labeled non-LESA while similar cells with high- 
er values of USE are labeled LESA. Such irregularities are probably due to the 
small number of respondents in many of the cells. 

In order to make the definitions of LESA and non-LESA more face-valid, we 
looked for relatively simple combinations of response patterns that would corres- 
pond closely to the cell assignments produced by the above algorithm. For example, 
consider Definition 1. 

Definition 1 ; 

A non-LESA child is one with: USE score of 3 or a SPUND score of 9 or ]0 (or 
both) 

A LESA child is one with any other response pattern. 
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The correspondence of Definition 1 \^ith list is: 

List 





LESA 


-LESA 


Total 


LESA 


837o 


337„ 






531 


152 


683 


-LESA 


17% 


67% 






111 


304 


415 


Total 


642 


456 


1098 



76% Classified the same by List and Def . 1 
58% Classified LESA by List 
62% Classified LESA by Def • 1 
% Bias = t6 

The 76% accuracy of this simple rule compares reasonably v?ell with both the 
79% maximum accuracy attainable using 5PUND, USE, and YEARS, and the 77% and 78%> 
accuracies detained by the discriminant functions (Table 2). T>te definition over- 
estimates the number of LESAs to a modest extent 

Nct^ consider a slightly more complex definition: 
Definition 2 : 

A non-LESA child is one with at least one of the following patterns: 
!• A USE score of 3 

2. A SPUND score of 10 

3. A SPUND score of 8 or 9 and a USE score of 1 or 2 and a YEARS 
score greater than 3, 

A LESA child is one with any other res pons ci pattern. 




The correspondence of Definition 2 with List is: 



List 





LESA 


-LESA 


Total 


LESA 


85% 


29% 






543 


133 


676 


-LESA 


15% 


71% 






99 


323 


422 


Total 


642 


456 


1098 



797o Classified the same by List and Def • 2 
587o Classified LESA by List 
627o Classified LESA by Def • 2 
' 7o Bias = -^5 

Definition 2 performs slightly better than Definition 1 both in terms of 
providing a slightly smaller overes timation of LESAs and in terms of classifying 
more people the same as did List. Since Definition 2 is preferred^ its performance 
by group both using List and FCTR as criterion is given in Tables 20-26. Comparing 
the performance figures for Definition 2 (Table 20) x^ith those of the overall dis- 
criminant function (Table 11), v;e see overall performance being highly similar vith 
the discriminant functions slightly underestimating the number of LESAs and Defini- 
tion 2 slightly overestimating them. Performance within group was considerably 
more variable; however, the same patterns generally emerged. The HELP, regardless 
of form tends to slightly overestimate the number of LESAs or be quite accurate in 
the Spanish and Chinese groups, while it rather sever ly underestimates the number 
of LESAs in the Other Asian and Navajo groups. The reason for this is not entirely 
clear. One possible factor is that both dichotomous criteria were geared to the 
local schools' definitions of LESA and non-LESA. (In all analyses reported above, 
FCTR was cut at a different place in each group in order to dichotomize it.) 
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Table 20 : Performance of Definition 2 relative to List and FCTR; by group; 
Children's Data. 



Overall 
List FCTR 

% Classified the 

same by criterion 

and Def. 2 79 77 

% Classified LESA 

by criterion 58 58 

7o Classified LESA 

by Def . 2 62 62 



7o Bias 



45 +6 



Cubans 
List FCTR 

80 74 

66 66 

72 72 
+ 9 f9 



Chicanes 



List FCTR 



85 82 



52 52 



62 62 



Cliinese Other As ians Navajos 

List FCTR 



List FCTR 



75 69 



63 63 



72 72 



+ 18 i-18 I +13 +13 



List FCTR 

71 71 

40 40 

29 29 

-28 -28 



Table 21 ; Overall Sample: Accuracy of Definition 2» 

A. List as criterion B. FCTR as criterion 

List List 



Def. 2 





LESA 


-LESA 


Total 




LESA 


-LESA 


Total 


LESA 


857o 


297„ 




LESA 


837o 


317o 






543 


133 


676 


Def. 2 


534 


142 


676 


-LESA 


157o 


717o 




-LESA 


177, 


697o 






99 


323 


422 




106 


316 


422 


Total 


642 


456 


1098 


Total 


640 


458 


1098 
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Table 22: Cubans: Accuracy of Definition 2» 



A. List as criterion 

List 



B. FCTR as criterion 

FCTR 





LESA 


-LESA 


Total 




LESA 


-LESA 


Tota 


LESA 


90% 


377o 




LESA 


85% 


46% 




Def. 2 


188 


40 


228 

Def. 


2 


178 


50 


228 


-LESA 


10% 


63% 




-LESA 


15% 


54% 






22 


67 


89 




31 


58 


89 


Total 


210 


107 


317 


Total 


209 


108 


317 


Table 23: 


Chi can OS : 


Accuracy 


of Definition 


2. 








A. List as 


criterion 


B. 


FCTR as 


criterion 





List 



FCTR 





LESA 


-LESA 


Total 


LESA 


-LESA 


Total 


LESA 


95% 


25% 


LESA 


92% 


28% 




Def. 2 


181 


43 


224 

Def. 2 


175 


49 


224 


-LESA 


5% 


75% 


-LESA 


8% 


72% 






10 


130 


140 


15 


125 


140 


Total 


191 


173 


364 Total 


190 


' 174 


364 


Table 24: 


Chinese : 


Accuracy 


of Definition 2. 








A. List as 


criterion 


B. FCTR as 


criterion 





List 



FCTR 



Def. 2 



LESA 


-LESA 


Total 




LESA 


-LESA 


Tota 


87% 


45% 




LESA 


83% 


54% 




81 


24 


105 


Def. 2 


76 


29 


105 


13% 


55% 




-LESA 


17% 


46% 




12 


29 


41 




16 


25 


41 


93 


53 


146 


Total 


92 


54 


146 
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Table 25 : Other Asians: Accuracy of Definition 2. 



A. List as criterion 

List 

LESA 



LESA 49% 
26 



Def. 2. 



-LESA 51% 
27 

Total 53 



-LESA Total 



157o 
12 



857„ 
68 

80 



38 



95 



133 



B. FCTR as criterion 

FCTR 



Def. 2 





LESA 


-LESA 


Total 


LESA 


497o 


157o 






26 


12 


38 


2 — 
















-LESA 


517o 


857o 






27 


68 


95 


Total 


53 


80 


133 



Table 26 : Navajos: (Ganado) Accuracy of Definition 2. 



A. List as criterion 

List 

LESA 
717o 
67 

Def. 2 



-LESA 297o 
28 

Total 95 



-LESA Total 
337o 

14 81 



677o 

29 

43 



57 



138 



B. FCTR as criterion 

FCTR 

LESA -LESA Total 

LESA 727„ 30% 

68 13 81 

Def. 2 



-LESA 28% 

26 

Total 94 



70% 

31 

44 



57 



138 
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However, if San Francisco and Arizona schools use higlier criteria of English 
ability in order to place a child on the non-LESA list, then the MELP should under- 
estimate the number of LESAs in both the Chinese and Other Asian groups, since 
both attended San Francisco schools. This is not the case; it overestimates the 
Chinese and underestimates the Other Asians • Of course, it is possible that the 
schools systematically demand more English from one group than the other, but that 
seems unlikely. 

Another hypothesis might be that since the average level of English proficiency 
among other Asians and Navajos (as measured by the Test) was quite high, parents 
might use different standards of comparison in those groups and systematically 
underrate their children on the important variables SPEAK and UNDERSTAND relative 
to parents in the other groups where the general level of English proficiency and 
use is less. Unfortunately, however, such a tendency would lead to an opposite ef- 
fect to the one observed --an overes timation of LESAr. in the more proficient group. 

A third, less interesting explanation may stem from the different distributions 
of English proficiency in LESA and non-LESA categories within the various groups. 
The observed underestimation effect could obtain if most LESA children in the Navajo 
and Other Asian groups were just belox^ the cut-off point betx^7een the txjo categories 
(on the test) while most of the non-LESAs were considerably above it in each of those 
groups. One needs only to assume that misclassif ication by the MELP is simply a 
direct function of the distance of the individual's test score from the cut-off 
point. Similarly, an overes timation of LESAs could occur if most non-LESAs were 
just above the cut-off score while most LESAs wore considerably below it. The with- 
in-group test/distributions are generally consistent with this explanation. 
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4> Scoring Keys to be Reconunended for Use \<'lt:h SIE Data 

On the basis of the analyses detailed above, three different scoring keys can 
be recommended as having done the test overall job of replicating the LESA and non- 
LESA categorizations of the children in the field test -~ as categorized by school 
list and dichotomized FCTR. Tv^o of these scoring keys take the form of linear 
equations employing the ten MELP variables as terms and multiplying each by a 
coefficient. One \<ias derived v^ith FCTR as the criterion and the other v;ith list 
as the criterion. The equations are given below: 

YpCTR^*'^ . 82 f . 01'''-^^11EN- .22''^FEA1C- . ll-'-UIs^DERSTAND- . 13'VSIB- .07''-FRIEND~ .^2-*aiLANG- . IS-vyiilARS 
f . 09'VBIRTH- . 01'--GRADE - . 08^'PARENT . 
"^LIST^^ • ^1 " * 12 '"^^^^^N" • 29*'^'SPEAK- . OBvv^^^^^ 
- . 30''^'BIRTH- . 02'VGRADE - . 07^>rARENT . 

If, for any individual child, the obtained value of Yp^r^n is greater than or equal 
to -.10, then the child is to be categorized as LESA. If the value obtained is 
less than -.10, the child is non-LESA. Exactly the same rule applies to Y^^^^-g.j., vjith 
-.10 also being the cutting point for that equation. 

The third scoring key is Definition 2 in Section 3: 

A child is to be considered non-LESA if his response pattern meets at least one 
of the f ollo\\7ing conditions : 

1. SPUND=10 

2. USE:=3 

3. SPUND=^ 8 or 9 and USE= 1 or 2 and^ YEARS greater than 3. 
Ail other children are to be considered LESA. 

It is important to stress that these scoring keys liave been derived and cali- 
brated for optimal performance on the field test data on ly . Chapter IX will take 
up the problems in applying these scoring keys to tlie SIE data in order to derive 
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national estimates of LESA individuals. At this point, a simple warning is 
order: It is likely that some rccalibration will be necessary before these 
scoring keys can be used to estimate percentages of LESAs from SIE data. 
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VIII. Derivation of Scorin<^ Keys - Adults 

The field test design was considerable different for adults than it \^as for 
children. In particular, while all children were sampled from lists provided by 
local school districts, lists of adults could be obtained from schools in only- 
two locations Dade County and El Paso. Thus, in those sites, adult samples 
were chosen entirely from lists of individuals who vjere currently enrolled in the 
local adult education program or V7ho had recently been so enrolled. In the other 
locations, adults were selected from the households of the children's sample. 
This difference in sampling strategy probably resulted in more heterogeneity of 
adults between sites than would have been the case if the same sampling plan had 
been used in all locations. It is important, then, to describe the samples of 
adults in somewhat more detail than was necessary for children, 

1. Description of Adult Sample s 

In the Cuban (Dade County) and Chicane (El Paso) samples, respondents were 
essentially self-selected in the sense that they had enrolled themselves in adult 
education programs. On the other hand, the adults in the Navajo (Arizona) and 
Asian (San Francisco) groups V7ere selected on the basis of the elementary school- 
aged children in their households having been screened for English proficiency and 
thus placed on a child list. The adult groups differed on many characteristics; 
but two variables, age and highest educational level attained, are displayed in 
Tables 1 and 2 as general indices of the differences among the groups. It should 
be noted that over a third of the Cubans were over 60 years old while no other 
group had more than 5% over that age. Also, teenagers were relatively numerous 
only in the Other Asian and Navajo groups. Witli respect to education, Cubans, 
Chinese, and Other Asians were mucli more highly educated than Chicanes and Navajos, 
Between one-third and one-half of tlie former groups reported liaving had at least 
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some post-secondary education while only 57e and 157o of the latter tv?o groups 
respectively reported post-secondary v^ork. These differences in demographic char- 
acteristics across the ethnic groups must be kept in mind when interpreting the 
results of analyses. 



Table 1 ; Adults: Per cent of each group in each age category (nuinborr, in paren- 
theses indicate cumulative percentages) 



Age 


Cubans 


Chicanes 


Chinese 


Other Asian 


Navajo 


14 - 18 


1 


(1) 


4 (4) 


6 (6) 


21 (21) 


26 (26) 


19 - 30 


5 


(6) 


26 (30) 


14 (20) 


14 (35) 


21 "(47) 


31 - 40 


17 


(23) 


32 (62) 


32 (52) 


31 (66) 


32 (79) 


41 - 60 


42 


(65) 


33 (95) 


45 (97) 


31 (97) 


18 (97) 


61 and over 


35 


(100) 


5 (100) 


3 (100) 


3 (100) 


3 (100) 


Total N 


272 




202 


111 


116 


214 


Table 2: Adults: Hishost Grade Reached (%) 

(ntunbers in parentheses indicate 

Highest 

Grade Cubans Chicanes Chinese 


cumulative 
Other Asian 


percentages) 
Naval o 


none - 6th 
grade 


21 


(21) 


65 (65) 


19 (19) 


8 (8) 


19 (19) 


7 - 9th 

grade 


19 


(40) 


19 (84) 


18 (37) 


10 (18) 


29 (48) 


10 - 12 

grade 


24 


(64) 


11 (95) 


31 (68) 


32 (50) 


37 (85) 


College 


19 


(83) 


2 (97) 


23 (91) 


40 (90) 


9 (94) 


Graduate 
Work 


16 


(99) 


3 (100) 


9 (100) 


10 (100) 


6 (100) 


Total N 


272 




202 


111 


116 


214 
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1. The Analysis Plan for the Adult Data 

In general, the analyses for adults v^ere designed to be analogous to those 
for children. An important difference, however, was that list information \vias 
not available for many adults, and so the test became tlie primary criterion measure 
of English proficiency. Tlie analyses can be very briefly summarized as follows: 



1. A dichotomous criterion variable, interpretable as a categorization of 
LESA and non-LESA, V7as constructed as described in Chapter VI. 

2. Using this dichotomous criterion variable, discriminant analyses V7ere 
run. Eleven MELP variables served as discriminators, and separate analyses wex^e 
run both within and across groups. 

3. Contingency table analysis la Cochran and Hopkins) was performed using 
five of the eleven predictors. This led to the construction of an explicit opera- 
tional definition of LESA-non-LESA vjhich could be used as an alternative to the 
dis cr iminant f unc t ion . 

3. Discriminant Analysis: Adult Data 

The procedure for doing discriminant analysis was generally the same as that 
with the child data. The SPSS statistical routines were used, and all analyses 
used the eleven MELP predictor variables defined in Chapter V. The dichotomized 
FCTR score was used as the criterion variable, and separate analyses were done for 
each of the five ethnic groups as well as over all groups. For the Cuban group, a 
separate analysis was done using the list information as the criterion. 

The results are presented in Tables 3-5. Table 3 presents the four-fold 
tables characterizing the degree of success with wliich the IIELP variables could 
predict LESA and non-LESA categorizations as defined by FCTR. The total percent of 
correct classifications and the bias are given in Table 4. It should be noted here 
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that the discriminant functions involved in these analyses were different for each 
group. That is, for Cubans, the predictions of LESA were made strictly on the 
basis of the discriminant functions derived from the Cuban data only; for Chinese, 
the x^i^<^<iictions are based on a strictly Chinese discriminant function, etc. 

It is clear that, across groups, the percent of individuals classified the 
same by MELF and FCTR is relatively stable between 767o and 847o. However, the 

amount of bias in predicting the proportion of LESAs in a group varies considerably. 
In terms of the difference between the percent of LESAs as determined by FCTR and the 
percent determined by MELP, the range is from predicting TL too few LESAs among 
Cubans and Chicanes to predicting 13% too many LESAs in the Other Asian group. But 
in terms of percent bias (the difference between the tT^?o percents divided by the 
percent LESA as determined by FCTR), the figures range from predictivig 117o too few 
LESAs among Cubans to predicting 887:> too many among Other Asians and Navajos. 

Table 3 ; Results of discriminant analysis: accuracy of prediction of eleven MELP 
variables, using FCTR as tb.e criterion. 



A. Cubans 



B. Chicanes 



FCTR 



FCTR 



:ted 





LESA 


-LESA 


Total 


LESA 


-LESA 


Total 


LESA 


777o 


267o 


LESA 


847„ 


247. 






142 


23 


165 


135 


10 


145 








Predicted 








-LESA 


23% 


747o 


-LESA 


167o 


76% 






43 


64 


107 


25 


32 


57 


Total 


185 


87 


272 


160 


42 


202 
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Tabic 3 continued. 



Predicted 



C. Chinese 




D. Other Asian 






FCTR 




Fcm 




LESA 


-LESA 


Total 


LESA 


-LESA 


Total 


LESA 887, 


29% 


LESA 


71% 


20% 




49 


16 


65 


12 


20 


32 






Predicted 








-LESA 13% 


71% 


-LESA 


29% 


80% 




7 


39 


46 


5 


79 


84 


56 


55 


111 


17 


99 


116 


Nava j o 




F. Overall 









FCTR 



FCTR 





LESA 


noil -LESA 


Total 




LESA 


n on -LESA 


Total 


^ LESA 


77% 


15% 




LESA 


88% 


22% 




Predicted 


20 


29 


49 


Predicted 


389 


102 


491 


-LESA 


23% 


85% 




-LESA 


12% 


78% 






6 


159 


165 




55 


369 


424 


Total 


26 


188 


214 


Total 


444 


471 


915 



Table 4 : Accuracy of the Within-group-derivcd discriminant functions, predicting 
dichotomized FCTR. 



7o Respondents cate- 
gorized the same by 
FCTR and MELP 

7o LESA by FCTR 

LESA by MELP 

7« Bias 



Overall 
83 

49 
54 
+ 11 



EKLC 



Cubans 
76 

68 
61 
-11 



Chi can OS 
83 

79 
72 
.. 9 
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Chinese 



79 



50 
59 
+ 16 



Other Asians 
78 

15 
28 

+ 88 



Nava i o 
84 

12 
23 
-f 88 
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Table 5 gives the uns tandardized and standardized discriminant coefficients 
on vhich the analyses discussed above \<fer:e based. 

Since list information was available for Cubans, it vjas possible to do a 
discriminant analysis vithin that group only using list as the criterion variabl 
The results of this analysis are presented : Tables 7 and 8. 

Table 6 : Results of discriminant analysis for Cubans using dichotomized School 
List as the criterion variable. 



List 



LESA 
LESA 737o 
• 135 



non-LESA 

257o 
22 



Total 



157 



Predicted 



-LESA 



27% 
49 



Total 



184 



757o 
66 



115 



88 



272 



747o classified the same 
LESA by list== 687o 
LESA by NELP= 587o 
Bias=^-157o 
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Table 7 : Results of discriminant analysis for Cuban adults showing standard- 
ized (S) and unstandardized (U) discriminant function coef f icients • 
School list information as tho. criterion variable. (All numbers given 
are to two decimal places* Decimal points omitted,) 



Sample Size 




272 




Variables 


U 




S 


WHEN 


03 




02 


SPEAK 


-62 




-64 


UNDERSTAND 


15 




17 


KID 


-01 




00 


T7PTFND 


-02 




00 


HIANG 


-69 




-17 


YEARS 


-08 




-12 




28 




22 


BIRTH 


-19 




-27 


GRADE 


-10 




-47 


INCOME 


-01 




-01 


CONSTANT 


300 







Within the Cuban adult sample, the MELP variables do not relate to the lists 
quite as well as they do to FCTR. They classify slightly fewer individuals the 
same when list is the criterion than when FCTR is, and they do so with more bias 
relative to list than relative to FCTR. 

Pertormnn^^ nf H.p nvP.rnll discriminant function by r,rou j3. Since it will not 
be possible for NCES to derive a separate discriminant function for each ethnic 
group surveyed by the SIE, the discriminant function derived from the entire pool 
of adult field test data must be evaluated as to ho^* well it performs within each 
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ethnic group iiwolved in the field test. In order to do this, the aggregate anal- 
ysis reported in Table 3F «as broken out by ethnic group. The results are given 
in Tables 8 and 9. They indicate that the overall discriminant function does 
reasonably «ell vithin each group. In terms of percent respondents categorized 
the same, the overall function does better in the Chicane, Other Asian, and 
Navajo groups than do the locally derived functions and it does s lightly «orse in 
the Cuban and Chinese groups than do the local functions. In terns of bias, the 
difference bet..een the percent LESA by FCTR and the percent UZSA by MELP ranged from 
3% for Other Asians to 14% for Chinese. Expressed as percent, the bias ranges from 
an underestimate of 18% for Other Asians to an overestimate of 27% and 31% for 
Chinese and Navajos respectively, l^hese bias figures compare favorably vith those 
deriving from the local discriminant functions given in Table 4. This analysis 
unequivocally supports the conclusion that, for the ethnic groups represented in 
this field test, little if anything would be gained by using locally derived dis- 
criminant functions instead of using the discriminant function derived from all 
groups pooled. 
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Table 8 ; Accuracy by group of discriminant function derived from entire sample. 



A . Cubans 



B. Chicanos 



FCTR 



FCTR 





LESA 


-LESA 


Total 




LESA 


-LESA 


Total 


LESA 


907o 


61% 




LESA 


97% 


60% 






167 


53 


220 




155 


25 


180 


Predicted 








Predicted 








-LESA 


10% 


39% 




-LESA 


37„ 


40% 






18 


34 


52 




5 


17 


22 


Total 


185 


87 


272 




160 


42 


202 


C. Chinese 






D. Other Asian 
















FCTR 






LESA 


-LESA 


Total 




LESA 


-LESA 


Total 


LESA 


91% 


3o/o 




LESA 


53% 


5% 






51 




-If 
11. 




9 


5 




Predicted. 








Predicted 








-LESA 


97o 


64% 




-LESA 


47% 


95% 






5 


25 


40- 




8 


94 


102 


Total 


56 


55 


111 


Total 


17 


99 


116 



E. Navajo 



FCTR 





LESA 


-LESA 


Total 


LESA 


73% 


8% 




Predicted 


19 


15 


34 


-LESA 


27% 


92% 






7 


173 


180 


Total 


26 


188 


214 
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Table 9: Accuracy within each group of the discrminant function derived from 
entire sample. 

Overall 

( from Table 4) Cubans C hicanos Chinese Other Asians Navajos 



7o Respondents cate- 
gorized the same by 1 
and HELP 

% LESA by FCTR (from 
Table 4) 

7o LESA by HELP 

7o Bias 



83 


74 


85 


77 


89 


90 


49 


68 


79 


50 


15 


12 


54 


81 


89 


64 


12 


16 


+-11 


+19 


+13 


+ 27 


-18 


+ 31 
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4> DGrivation of a Scoring Key Through Contingency Table Analysis 



An alternative to the discriminant function approach is the direct analysis 
of a multiway contingency table according to the procedure reported by Cochran 
and Hopkins (1961) . This approach v^as employed for the child data (see Chapter VII) . 
Instead of deriving a linear equation to categorize individuals, this method simply 
enumerates all possible patterns of responses to the MELP variables and assigns 
each to either LESA or non-LESA according to the relative numbers of LESA and non- 
LESA respondents (as determined by the criterion measure) displaying that particular 
response pattern. An advantage of the method is that it makes no assumptions about 
the distributions of the predictors (except for assuming that they are discrete), 
but a disadvantage is that it becomes unwieldy with a large number of possible 
response patterns. In order to apply it to the child data, the number of MELP 
predictors was reduced by elimination and consolidation from ten to three. A 
similar reduction was also needed in order to apply it to the adult data. The 
first part of this section, then, will describe the process of reducing the number 
of predictors to a manageable number and the second will report the analysis proper. 

Reducing the number of MELP variables 

In order to make the data restricted enough for the Cocliran and Hopkins analy- 
sis, the number of possible response patterns were reduced. There were three pos- 
sible ways of doing that: 

1. Elimination of variables 

2. Reducing the number of response alternatives within a variable 

3. Constructing a single composite variable from several variables 

In the child analysis, all three straLegies were used. That is, I^HEN, PARENT, 
BIRTH, and GMDE were eliminated. The second strategy was employed with YEARS, and 
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the third \^as employed in combining SPEAK and UNDERSTAND into SPUND (thus reducing 
25 possible response patterns to 9) and ir combining HLANG, SIB, and FRIEND into 
USE (reducing 27 possible response patterns to 4). All three strategies \^ere also 
used in the adult data. 

SPUND - As vith the child data, SPUND vas defined simply as the sum of the 
numerical values of SPEAK and UNDERSTAND. The justification for this \^as as 
follo\-7s: first, the tv?o variables \^ere highly correlated in all ethnic groups 
(approximately .80 in all groups and overall). Second, both were approximately 
equally correlated \^ith FCTR and inspection of the three way contingency table of 
SPEAK by UNDERSTAND by FCTR did not sho^^ any distinctive relationships betx^een any 
two of the three. Thus, a more intricate combining of the two variables did not 
seem called for. Third, the possibility of eliminating one or the other on grounds 
of parsimony was not pursued because the tx<}o variables were the most closely re- 
lated to FCTR of any of the predictors, and the inclusion of both was thought to 
aid the reliability of the MELP. 

The USE variables - There v?ere three language use variables among the 
eleven predictors: HLANG, KID, and FRIEND. These were tested to see if they 
formed a Guttman scale in tho. same way that HLANG, SIB, and FRIEND did for children. 



The three way cross tabulation of the items is given bolo\'? 



HLANG= not English 
KID 



HLANG= English 
KID 



not English 

not English 550 
FRIEND 



English 57 



English 
48 



not English 

not English 29 
FRIEND 



42 



English 



English 
53 



127 
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In order for there to be a meaningful Guttman scale, four of the cells must 
be almost empty and four must be relatively large. Clearly, that is not the case 
in the above table, and so the idea of compositing these variables \^as dropped. 

In order to decide \<!h±ch variables to eliminate from the set of predictors, 
two sorts of evidence were inspected. First, we inspected the correlations of each 
of the eleven predictors with FCTR (not dichotomized) within each group and overall. 
Those correlations are reproduced below: 

Ethnic Group 



MELP Variable 


Cuban 


Chicano 


Chinese 


Other Asian 


Nava i o 


Overall 


WHEN 


14 


07 


45 


33 


-06 


41 


SPEAK 


58 


53 


74 


47 


50 


73 


UNDERSTAND 


58 


53 


69 


44 


45 


71 


KID 


04 


14 


53 


08 


42 


41 


FRIEND 


18 


18 


56 


37 


37 


49 


HIANG 


-06 


08 


54 


20 


40 


45 


YEARS 


30 


39 


73 


39 


59 


69 


NEWS 


-41 


-32 


-57 


-38 


-55 


-55 


BIRTH 


16 


05 


38 


32 


20 


34 


GRADE 


40 


29 


44 


44 


56 


43 


INCOME 


18 


15 


. 30 


37 


15 


31 
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A rank ordering of these correlations indicates that the most important 
predictors besides SPEAK and UNDERSTAND are YEARS. NlC^v'S , FRIEND, IHANG, and 
GRADE. These variables ..ere placed in a step^.ise discriminant analysis «ithin each 
group, and the order in which the variables vere entered into the analysis was 
observed. Tl.e results indicated that the four most important variables (in addi- 
tion to SPUND> for predicting dichotomized FCll. were YEARS. NE..S , HLANG, and GRADE. 
These variables plus SPUND vere therefore retained for use in the contingency 
table analysis . 

A ^ ^^f- r^f .rPT-T-ihles hwcver , it \^as important to fur- 
Even within this reduced set ot variaoies, uuwcv^i., 

ther reduce the number of possible response patterns. Thus, YEMS . NEWS. IEA«G. 
and GEABE »ere dichotomized. This .as done by going back to the crosstabulations 
of each variable by FCTR to ascertain ho,, to cut the variable and still maintain 
maximum discriminating pa,er »ith respect to FCTR. The £oll«ang dichotomi.ations 
vere made : 

" lo\^" values 
0 - 3 

"never" and "occasionally'' 



YEAllS 
NEl^S 
HLANG 



GEADE 



"no response" and any re- 
sponse except "English" 

0 through 6th grade 



"hiph^' values 
4 and over 
"Often" 
"English" 



7th grade and above 



The basic crosstabulation , then, was SPU^.^D x YEARS x NEI.S x IILANG x GRADE. 
It had a total of 144 cells. Each cell represented a particular pattern of HELP 
responses, and within each cell was placed the number of adults displaying that 
response pattern and the proportion of those who were classified LESA by FCTR. 
That crosstabulation is reproduced as Table 10. (In it SPUND categories 2-7 have 
been collapsed to facilitate presentation.) 
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Table 10 ; Percent LESA adults for various combinations of SPUIn^D, YEARS, NEl^S ; 



NEl^S=lo\>? 



HLANG, and GRADE. ND indicates no respondents in that cell, 
cates N in cell. 

NEl'JS^^iigh 



HLANG=1ot 






HLANG=low 








GRADE =lw 






GRADE =low 










YEARS 








YEARS 






Low 


High 




Low 


High 




2-7 


72 (25) 


0 (1) 


2-7 


92 


(165) 64 


(11) 


8 


67(9) 


20 (5) 


8 


44 


(9) 25 


(4) 


SPUND 






SPUND 








9 


ND 


0 (2) 


9 


0 


(1) 100 


(1) 


10 


ND 


0 (1) 


10 


100 


(1) 0 


(2) 



( ) indi- 



NEt-JS :^low 
lILA]S[G::aiigh 
GRADE -la^7 



NEWS4iigh 
HLANG:^high 
GRADE -low 





YEARS 






YEARS 






Lotj High 






Low 


High 


2-7 


100 (6) 100 


(1) 


2-7 


63(8) 


ND 


8 


50 (2) 0 


(2) 


8 


0(2) 


0 (3) 


SPUND 






SPUND 






9 


0(1) ND 




9 


100 (2) 


ND 


10 


ND 0 


(1) 


10 


ND 


ND 



ERIC 



NEWS ^lo\* 
HLANG=1ot^ 
GRADE =liigh 



SPUND 



2-7 
8 
9 
10 



YEARS 
Low High 

74 (27) 11 (19) 

24 (21) 13 (23) 

20 (5) 0 (11) 

ND 0 (25) 



NES'JS^igh 
HLANG-lOT^ 
GRADE =liigh 



SPUND 



2-7 
8 
9 
10 
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YEARS 
Low High 

76 (156) 38 (34) 

36 (42) 17 (41) 

0 (2) 14 (7) 

0 (1) 0 (11) 



1 <: ' 



Table 10 continued 



HLANG=iiigh 
GRADE =4iigh 



YEARS 



SPUND 



NmS=liigh 

HLANGHiigh 

GRADEtrhigh 



YEARS 







High 




Low 




High 


2-7 100 


(2) 


0 (5) 


2-7 


100 


(3) 


0 (2) 


8 50 


(4) 


03 (29) 


8 


0 


(3) 


06 (18) 








SPUND 








9 ND 




0 (6) 


9 


ND 




20 (10) 


10 33 


(3) 


03 (73) 


10 


ND 




0 (18) 



The Cochran and Hopkins procedure calls for assigning to the category LESA 
any cell which has a larger proportion of LESAs than does the sample as a whole. 
In this case, 49% of the total sample of adults are LESAs, so any cell with 50% 
or more LESAs was considered to be LESA. Using this procedure on the entire 144 
cell table, the follat^ing table was derived representing the predictive accuracy 
of the five variables relative to dichotomized FCTR. 



FCTR 



ERLC 



LESA 
LESA 86% 
384 



Predicted 



-LESA 



Total 



14% 
60 
444 



-LESA 

16% 

76 



84% 
395 



471 



85%. categorized the same 
% LESA by FC'CR^-- 49 
% LICSA by MELP= 50 
'/, Bias= 4 
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Total 



460 



455 



915 



1 O"' 



This represents the maximum correspondence that any explicit operational defi- 
nitions of LESA and non-LESA involving these five variables could have with FCTR. 

An examination of Table 10 indicates that the most powerful predictor variables 
were SPUND and YEARS. Performing a Cochran and Hopkins analysis on just these two 
predictors, the following table was derived: 



FCTR 





LESA 


-LESA 


Total 


LESA 


82% 


17% 






363 


78 


441 


Predicted 








(SPUND, YEARS) 

-LESA 


18% 


83% 






81 


393 


474 


Total 


444 


471 


915 



837 classified the same 
49% classified LESA by FCTR 
48% classified LESA by MELP 
-1% Bias 

The pattern of cells underlying the above table happen to exactly conform to 
the following definitions of LESA and non-LESA. 

1. A respondent is non-LESA if: (SPUND > 8) or(YEARS > 3) 

2. A respondent is LESA if he has any other values of SPUND and YEARS. 



VIII - 18 



The reduction of five predictors to tx^o predictors loses only two percent in 
the number of respondents classified the same by MELP and FCTR, and the amount 
of bias remains very low for the sample of adults as a x^hole. Thus, it is this 
definition that we would choose for adults. The accuracy of the definition within 
each ethnic group is given in Table 11 and 12, Percent categorized the same by 
the definition and FCTR ranged from 76 to 90, Tlie absolute difference between 
percent identified as LESA and FCTR and that identified as LESA by the definition 
varied from essentially zero to 8% while the percent bias varied from a 67o over- 
estimation to 537o underestimation. 
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Table 11 ; Performance of SPUND -YEARS scoring key by group. 



A. Cubans 



B. Chicanos 



FCTR 



FCTR 





LESA 


-LESA 


Total 




LESA 


-LESA 


Total 


LESA 


86% 


44% 




LESA 


91% 


50% 






159 


38 


197 




146 


21 


167 


MELP 








MELP 








-LESA 


14% 


56% 




-LESA 


9% 


50% 






26 


49 


Id 




14 


21 


35 


locai 


185 


87 


O "7 O 

2 12. 


Total 


160 


42 


202 




FCTR ' 






FCTR 






LESA 


-LESA 


Total 




LESA 


-LESA 


Total 


LESA 


80% 


18% 




LESA 


24% 


4% 






45 


10 


55 




4 


4 


8 


MELP 








MELP 








-LESA 


20% 


82% 




-LESA 


76% 


96% 






11 


45 


56 




13 


95 


108 


Total 


56 


55 


111 


Total 


17 


99 


116 



E» Navajo 

FCTR 





LESA 


-LESA 


Total 


LESA 


35% 


3% 




MELP . 


9 


5 


14 


-LESA 


65% 


97% 






17 


183 


200 


Total 


26 


188 


214 
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Table 12 ; Accuracy of SPUND-YEARS scoring key by group. 

Overall Cuban Chicano Chinese Other Asian Navajo 



% categorized 
the same by FCTR 

and MELP 83 76 83 

7„ categorized 

LESA by FCTR 49 68 59 

°U categorized 

LESA by MELP 48 72 61 

7o Bias -1 +6 +3 



81 

50 

50 
0 



85 
15 
7 

-53 



90 
12 
7 

-42 



r 
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5> Recommended Scoring Keys for Categorizing; Adults as LESA and non-LESA 

On the basis of the analysis detailed above, t^^o alternative scoring keys 
are reconunended for categorizing adults as LESA and non-LESA on the basis of their 
HELP responses: 

A. A discriminant function involving eleven predictor variables. The dis- 
criminant function \^as derived by pooling all adult data into a single analysis. 
The equation is as f ollov7s : 

^FCTR=- • OS-n^HEN - . 25-^ PEAK- . 13'^^-UNDERSTAKD- . OS^-'KID - . 05''^FRIEND \ . 12HLANG- . OS^-'YEARS 
\ . 19^NE\^S - . 10^-BIRTH- . 03'^GRADE - . 06^--INCOME \2 . 06 
For any given respondent, if his discriminant function score is above 0.02 he 
is assigned to the LESA category. If his score is equal to or below that value, 
he is assigned to the non-LESA category. 

B. An operational explicit definition involving the variables SPITND and YEARS. 
An adult is assigned to the category non-LESA if his response pattern conforms to 
either of the follo\-7ing patterns: 

1. SPUND greater than 7 

2. YEARS greater than 3 

All other adults are assigned to the LESA category. 

With respect to overall performance, these tx^o scoring keys are approximately 
equivalent; however, they were derived using markedly different approaches. The 
discriminant function approach is basically a multiple regression approach and its 
strengths and weaknesses are well-known. For example, it assumes continuous pre- 
dictors (which we clearly do not have)* The contingency table approach requires 
very few assumptions; however, the data from the field test are relatively sparse 
in some regions of the table and thus generalizing from them may be risk/. Which 

scoring key is used depends on an individual's preference. 
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IX. Finding an Unbiased Estimator of the Proportion of LESAs in the U. S. 

In Chapters VII and VIII, we have developed scoring keys which give rela- 
tively useful results for predicting the dichotomized LESA distributions of the 
respondents in our field test; hmever, the sampling plan of the field test dif- 
fered from that of the SIE in important ways and the ramifications of these 
differences must now be considered. Many of the issues raised in this chapter 
and the solutions proposed to deal with them were spelled out in a conference attended 
by representatives of CAL, RTI, and NCES and by a number of our technical consultants. 
The proceedings of that conference and the list of participants can be found in 
Appendix 17. 

1. The List Samples 

With respect to children, using list samples delivered to us by the schools had 
several advantages. The sampling required almost no statistical expertise or prior 
knOTledge of the communities on the part of RTI or CAL* Also, the dichotomous 
property of the lists v?as invaluable for constructing scoring keys that yielded 
dichotomous classifications. Hov?ever, the use of lists also had disadvantages. 
The first disadvantage was that RTI and CAL essentially lost control of how 
children were selected onto the lists from the pool of all children in the school 
districts who had been screened for their English proficiency. Tlius , we have 
no grounds for assuming that the lists in the various sites were random samples 
of all the children in that age range who were so classified or that the interviews 
obtained were a random sample of the lists obtained from the schools. For example, 
in each site, about a third of the addresses given as the children's residences 
were four 1 to be wrong, and informal evidence indicated that some parents deliber- 

rately gave the school incorrect addresses to avoid busing or some other administrative 
regulation. In the majority of cases such children were simply replaced with 
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others from the same list. Also, in San Francisco, lists were constructed from 
the rosters of only a few schools selected for their high concentrations of the 
ethnic groups we were examining. These are just t^^o of the factors that caused 
the samples of children interviewed to be decidedly non-random parts of the LEA's 
potentially LESA populations. 

The sampling problem with adults was also serious. In Miami and El Paso, 
all adults were sampled from the pool of individuals who had recently enrolled 
in adult education classes. How that pool relates to the general pool of non- 
native -English speakers in those areas as legislatively defined is completely 
unknown* In Arizona and San Francisco, where adults were taken from the house- 
holds of the children, all of the sampling problems of the children apply to the 
adults with the additional qualification that all these adults came from households 
containing an elementary school aged child. 

2. The Distribution of LESAs and non-LESAs 

A second, quite different problem was that RTI was instructed to interview 
approximately equal numbers of individuals on each of the lists they obtained. This 
led to the production of scoring keys which had approximately equal error rates for 
the identification of LESAs and non -LESAs. However, we have reason to suppose that 
the two categories are not at all in equal proportions nationally. A recent census 
of the Spanish speaking school population of Dade County (Florida) indicates child- 
ren on the ''independent" list to be three to four times more numerous than the 
children on the other two lists combined. Similarly, but more indirectly, pre- 
liminary tabulations from the July, 1975 "Survey of Languages," done by the Bureau 
of the Census for NCES, indicates that a large majority of school children whose 
native language (as defined legislatively) is not English are reported by Household 
Respondents to have ^'no difficulty" in speaking or understanding English. 
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The problem is that if the scoring key is to provide an unbiased estimate 
of the proportion of LESA children, its rates of identification errors must be pro- 
portional to the relative numbers of LESA and non-LESA children in the population- 
To illustrate: Suppose ve have a procedure v?hich identifies both LESA and non-LESA 
children with an error rate of 25%. This could be expressed in a four fold table as: 



True Category 







LESA 


non-LESA 




LESA 


75% 


25% 




Application of 








fallible procedure 








Non-LESA 


257o 


75% 






1007o 


100% 



If this fallible procedure were applied to a population with equal numbers 
of LESAs and non-LESAs, it would yield an unbiased estimate of the population 
proportion of LESAs since it would falsely identify the same numbers of LESAs and 
non-LESAs, Applied to a population of 1000: 
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True Category 



Estimated Totals 



Application of 
MELP Procedures 



Actual Totals 





LESA 


non-LESA 


LESA 


15% 


25% 




375 


. 125 


LESA 


25% 


75% 




125 


375 


500 


500 



(100%) 



(100%) 



500 



500 



75% categorized correctly by MELP 
50% True LESAs 

50% Categorized LESA by MELP 
0% Bias 

Ho\57ever, now consider the same procedure applied to a population of 1000 where the 
true number of LESAs js only 200: 



True Category 



Estimated Totals 



Application of 
Procedure 



Actual Totals 



LESA 


non-LESA 


75% 


25% 


150 


200 


25% 


75% 


50 


600 


200 


800 



350 



650 
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75% Categorized correctly by IMELP 
20% True LESAs 

35% Categorized LESA by MELP 
+ 75% Bias 
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In this case, while the procedure still errs at the same rate in each category, 
the resulting estimate of the true proportion of LESAs is highly biased 35% 
as compared with a true proportion of 207o. Clearly what is needed is a revised 
procedure which will mis -classify equal numbers of children rather than equal 
percentages. But this involves adjusting the error rates in a ratio equal to the 
ratio of LESAs to non -LESAs in the population. 

For example, if the true ratio f LESA to non-LESA persons in the population 
was one to four, as in the above example, then in order to misclassify equal 
numbers of individuals the* procedure V70uld have to identify non-LESAs with an 
error rate one fourth the magnitude of the error rate involved in identifying LESAs. 
This could yield the follo^^7ing table: 



True Category 
LESA 



LESA 



75 



Application of 
Procedure 



non-LESA 



Actual Totals 



63% 
125 

200 



non-LESA 
16% 
125 



Estimated Totals 



200 



84% 
675 

800 



800 



757o Categorized correctly by MELP 
20% True LESAs 

20% Categorized LESA by MELP 
0% Bias 

We have chosen the numbers in this table so that it has the same total number 
of individuals categorized correctly as the table above it (75%) -~ in other words, 
the two procedures have the same overall validity. However, in this latter case, 
the procedure does very badly in identifying LESAs (classifying more wrong than 
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right), and exactly four times better (63%/4=167p) in identifying non-LESAs» This 
leads to balanced numbers of false positive LESA identifications and false nega- 
tive ones. There are a number of ways in which empirically-derived identification 
procedures such as those in Chapters VII and VIII can be calibrated to display a 
particular ratio of false positive and false negative Identifications; but in order 
to do such a calibration, either the true proportion of LESA individuals in the 
population must be estimated in advance (to estimate this is the reason for the 
survey in the first place) or the ''true^* error rates of indentif ication in the pop- 
ulation of interest the SIE population must be known. Unfortunately, because of 
the sampling factors already discussed, we can have no confidence in estimating 
these from the field test results. 

This problem is treated in depth both from theoretical and empirical per- 
spectives by Hartwell et_. al. in the Research Triangle Institute's final report 
to CAL on their subcontract for this project. The reader is referred to section 
V.F*4 and page 100 in that report. (Hartwell, Moore, Weeks, Mason, and Shah; 
Design^ Data Collection and Analysis of Instruments and Procedures to Measure 
English Language Proficiency . Research Triangle Park, North Carolina: Research 
Triangle Institute. April, 1976.) 

Basically, Hartwell explored three ways of coping with the problem- First, 
he artifically simulated the expected relative proportions of LESA and non-LESA 
respondents in the nation by creating a new data file in which all non-LESA 
data in the field test corpus was duplicated 4 times. This new file was then 
subjected to discriminant analysis. The results indicated that the discriminant 
functions derived from the new file were similar to the original functions , but 
that they over-estimated the percent of LESA individuals in all groups by 8% to 
144%. Overall, the overestimation was 28%. 
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Second, Hartwell explored the use of a correction factor that could be 
applied to the SIE data to estimate the percent LESA children" nationally. This 
correction factor, however, assumes that the user has accurate estimates of the 
rates at which the identification procedure (the HELP) produces false positive 
and false negative identifications of LESA in the SIE context. The only estimates 
available are from the field test data, and they are suspect because of the non- 
random selection of respondents within the LESA and non-LESA categories already 
discussed. Nevertheless Hartwell, et. al, , present evidence that such a post 
hoc correction may be more accurate than attempting to derive a usable scoring 
key through simulation techniques. This procedure generally produced -under- 
estimates of LESA proportions (in four of the five groups) rather than the over- 
estimates resulting from the simulations. The deviations of the estimations from 
percent LESA as defined by list ranged from an overestimation of 30% in % bias 
^ terms in the Other Asian group to an underestimation of 39% in the Cuban group. 

A third technique, favored over the other two by RTI, was for a two-stage 
sampling plan to be executed as part of the SIE. This would involve obtaining 
criterion information perhaps both list and FCTR or DORP on a representative 
subsample of children from the SIE households as soon after their regular inter- 
view data were gathered as possible. From this information, accurate national 
estimates of the percent of LESA could be derived and the scoring keys could either 
be rederived or recalibrated. 

3. Adjusting the Face-Valid Definitions of LESA and non-LESA 

In pursuing the recalibration of the MELP to accomodate it to the expected 
low proportion of LESAs in the SIE sample, RTI only worked with the discriminant 
analysis, however, the face-valid definitions can also be adjusted to give a more 
H accurate estimate of LESAs in the SIE context. Basically what desired is to 
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find a definition vhich, when applied to the field test data, will yield a ratio 
of false positives of LESA identification to false negatives equal to the ratio 
of LESAs to non-LESAs in the SIE sample. In particular, 



Let us assume that for children the above ratio is four to one in the general 
population of non-native English speaking children and what is required is a 
modification of Definition 2 to accommodate it to this fact. Over the entire 
field test sample, Definition 2 yielded a ratio of 99/133 or ,74, T^at is needed 
is to redefine the non-LESA category to include more respondents. Applying the 
Cochran and Hopkins procedure to this situation, p becomes ,80 as the criterion 
for deciding whether a given SPUND-USE-YEARS response pattern is to be considered 
LESA or not. Table 19 in Section VII has 13 cells with percent LESA above 80. A 
possible definition might be: 

Definition 3: A child is non-LESA if his response pattern meets either 
of the following conditions: 



proportion of False Negatives 
proportion False Positives 



proportion of non-LESAs 
proportion of LESAs 



1. 



A USE score of 2 or 3 



2. A SPUND score of 8,9, or 10 
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The correspondence of this definition with list is: 



LIST 





LESA 


non-LESA 






no/ 
/ /o 




323 


33 


Definition 3 






non-LESA 


50% 


93% 




319 


423 


Total 


642 


456 



Total 



356 



742 



607o classified the same by List and Def, 3 
587o classified LESA by List 
32% classified LESA by Definition 3 
--45% bias. 



Now, suppose V7e artifically simulate a "true" LESA - non-LESA rat 
1 to 4 from the field test data by simply multiplying the list non-LESA 
by 5.63, obtaining: 

LIST 





LESA 


non-LESA 


Total 


LESA 


507o 


7% 






323 


186 


509 


non-LESA 


507„ 


93% 






319 


2382 


2701 


Total 


642 


2568 


3210 



84% classified the same by List and Def. 3 

20% classified LESA by List 

16% classified LESA by Def. 3 
-21% Bias 
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Definition 3 in a sense overshoots its objective by classifying too few respondents 
as LESA even with only 20% actual LESAs in the population, (This compares with 
an overestimation of 101% if Definition 2 were applied to the above simulated 
population.) Clearly, Definition 3 could be modified slightly to categorize 
a slightly larger 7o of respondents as being LESA. (For example, changing condition 
1 of Definition 3 to include USE values of only 3 would result in 29% of the 
simulated population being categorized as LESA.) Such "fine tuning" of these 
definitions has a completely ad hoc character with unknown generalizability 
beyond the samples involved here. Nevertheless, it is important to note that a 
number of reasonably face-valid definitions can be easily formulated, each with 
distinct implications for the magnitudes of the LESA counts obtained through 
their use. 

4^ Sununary of Recalibration Recommendations 

CAL recommends RTI's double sampling proposal with the added recommendation 
that the face-valid definitions suggested above and some similar ones be tested 
on the data obtained in the double sampling effort. This v7ould be in addition 
to re-deriving the disciminant functions using those data. If such a double 
sampling is not possible to implement, then the correction formula suggested 
by Hartwell can be used. (Note, however, that Hartwell's cautions on page 94 
are extremely important). In any case, the behavior of a scoring key in estimating 
different proportions of LESA individuals must be kept in mind. That is, if 

P (False Negatives) = P (non-LESAs in the population ) 
• P (False Positives) P (LESAs in the populations) 

then the number of LESAs in the population will be systematically under-estimated 
while if the inverse obtains the number of LESAs will be over-estimated. Putting 
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it slightly differently, if we estimate the ratio of error rates for some 
scoring key to be, say 507o/77o or 7.14 (as for Definition 3), then ve know that 
for populations in which the true ratio of non-LESAs to LESAs is less than that, 
there will be an underestimation of LESAs, Thus, if Definition 3 were applied 
to the SIE data and yielded a % LESA of 30, we would know that that was an under- 
estimate. (Again, this assumes that the error rates as observed in the field 
test are reasonably accurate.) On the other hand, if we obtained an estimate of 
30% using Definition 2, then wc would know that it was an overestimate since the 
ratio of error rates for Definition 2 is 0,52 while the observed ratio of non- 
LESAs to LESAs was 707o/307o or 2.33. 

5« Adults 

The proposal for double sampling and recalibration of the scoring keys 
applies only to children because it is only for them th^t there ?re relatively 
unequivocal dichotomous classifications of LESA and non-LESA available externally 
to the SIE (that is, from schools). If adults were to be double-sampled , the 
criterion instruments which could be used would be a test or a direct rating. 
Neither of these, however, has a non-arbitrary way of dichotomizing the scores 
obtained from them into LESA and non-LESA categorizations. Thus, the criterion 
instruments would not lead to a robust estimation of the proportion of LESA 
adults in the nation. 

The alternatives for adults would seem to be two: 
1. Use the discriminant function to estimate LESAs and simply keep in mind 

that if the obtained proportion deviates greatly from .5 (approximately 

what it was in the field test), it is a biased estimate; i.e., P(LESA) 
.5 implies a probably underestimation and P(LESA) c ,5 implies a probable 

overe St imat ion • 
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Use the face-valid definition and simply depend on its manifest content 
to provide an accurate count of LESAs. Tliis amounts to assuming that if 
an adult is claimed to have spent more than three years in an English- 
language school or is claimed to speak and understand English well, then 
he is counted as non-LESA, 
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X. Accuracy of MELP Data as Reported by a Household Respondent about Another 
Adult in the Household, 

In the SIE, the Household Respondent » will generally be the source of all 
information about each individual in the household. The purpose of this section 
is to explore the quality of the data given by the Household Respondent about 
another adult (14 years old and older) member of the household. Such data, which 
we will call proxy data , will be examined both for its correspondence with firs t ~ 
hand-data ~- information which an adult gives about himself and for its corre- 
spondence with dichotomized FCTR. It should be noted that this problem arises 
only with adults because children (13 and younger) will never be asked to give 
information to the SIE interviewer; all information about the children in a house- 
hold will be obtained from the Adult Household informant. Thus, all child data 
will be proxy data. 

So far in this report, all analyses that have been reported for adults have 
been based on first-hand data that an individual gave about himself. (All child 
data analyses in this report are based entirely on proxy data.) However, during 
the field test, whenever there was an adult available in the household in addition 
to the Designated Adult Respondent, he or she was asked to serve as a Household 
Respondent and provide answers to the MELP questions about the Designated Respon- 
dent. Unfortunately, in many households there was not an appropriate additional 
person available so proxy data were unobtainable. 

The following table indicates the amounts of proxy data available in the var- 
ious groups for analysis: 




Household Proportion 

Group Adult Respondents (proxy) Respondents proxy data 

Cuban 272 118 .43 

Chicane 202 96 .48 

Chinese 111 45 .41 

Other Asian 116 48 .41 

Navajo 214 178 ^ 

Total 915 485 .53 



A first question to be asked is whether the proxy respondent gives 
answers at all to the MELP questions. Table 1 gives the percent of answers 
of "don't know" or "no answer" for proxy data for each MELP question. 

Table 1: Percent Scoreable and Unscoreable Answers given by Household Respondents 

about Other Adult Members of the Household. (All Groups Combined, N= 485) 

MELP Variable Scoreable Response Don't Know No Answer 



WHEN 


96 


4 


0 


SPEAK 


97 


0 


3 


UNDERSTAND 


97 


0 


3 


SIB 


100 


0 


0 


FRIEND 


99 ' 


1 


0 


HLANG 


99 


0 


1 


YEARS 


84 


3 


13 


BIRTH 


84 


11 


5 


GRADE 


92 


7 


1 
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It clearly shows that the rate of usable responses is very high for all 
variables except the "historical" ones that is, those asking for specific 
facts about an individual's background in which case 8 to 167o of the responses 
were either not recorded or "don't know". These rates may be higher than those 
to be encountered in the SIE proper for the following reason: In the field test, 
the household respondent was instructed to answer the questions about the desig- 
nated respondent on the basis of his o^^n knowledge. There was to be no "pooling" 
of information from any and all members of the household present at the time. 
In the SIE, ho\^ever , the interviewer will make an effort to obtain complete 
information on each household member from whomever is available at the moment. 
In other words, the interviewer will not be compelled to talk to only one 
individual per household. Thus, we might expect more complete information using 
that procedure. 

An important statistic to be derived from the field test data is the 
number of usable protocols that could be entered into a scoring key and thus 
from which a LESA - non-LESA categorization could be derived. In the case of 
the first-hand data, the total sample for which FCTR scores were available was 
1150 while the total number for which there was complete MELP data was 915 or 
approximately 807o. In the case of the proxy data, there were 454 FCTR scores 
and 313 complete MELP protocols (69%). Thus, if these data provide reasonable 
guidance, NCES should expect up to 107o fewer complete protocols derived from proxy 
data than from first-hand data. 

A second question about proxy data is: are the data obtained as 
predictive of LESA and non-LESA categorizations as are first-hand data? To 
answer this question, the overall discriminant function was applied to the 313 
proxy protocols. The following table indicates the resulting correspondence 
with dichotomized FCTR scores: 
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FCTR 





LESA 


non-LESA 


Total 


LESA 








Discr. Function 
MELP (using proxy 


118 


38 


156 


data) 

non-LESA 


19 


138 


157 


TOTAL 


137 


176 


313 



827o categorized the same by Test and MELP 
447o categorized LESA by test 
507o categorized LESA by MELP 
+ 127o Bias 



These figures are highly similar to those for first-hand data. There the percent 

categorized the same by test and MELP was 837o and the bias was | 117o, The 

correspondence with test of the SPUND-YEARS definition of LESA non-LESA when 

applied to proxy data are given below: 

TEST 





LESA 


non-LESA 


Total 


LESA 








Definitional MELP 


107 


25 


132 


(proxy data) 

non-LESA 


30 


151 


181 


TOTAL 


137 


176 


313 



827o categorized the same by test and MELP 
447o categorized LESA by test 
427o categorized LESA by MELP 
- 47o Bias 

Agair^ these figures are higlily similar to those for first-hand data, 

X - 4 



91 



Finally, Table 2 gives the cross tabulations of the first-hand and proxy 
data for the three most important MELP variables; SPEAK, UNDERSTAND, and YEARS. 



Table 2: Cross tabulations of Proxy by First Hand Responses to Selected MELP 
Questions 



A. SPEAK Proxy Response 







1 


2 




3 


4 


1 


Total 




1 


34 


16 




1 


4 


1 


56 




2 


16 


133 




11 


27 


6 


193 


First 
nana 


3 


1 


5 




1 


6 


0 


13 


Response 


4 


0 


12 




3 


70 


41 


126 




5 


1 


3 




0 


17 


76 


97 


Total 




52 


169 




16 


124 


124 


485 




Response options: 














1 = 


not at 


all 














2 = 


Just a 


little, 


don 


t know, ( 


or missing data 






3 = 


Adequate for a 


few 


purposes 










4 = 


Well, 


adequate 


, or 


adequate 


for most 


purposes 





5 = Very well 



1 


UNDERSTAND 










Proxy 


Response 






i 


2 




3 




4 




5 


Total 


1 


31 


9 




3 




3 




0 


46 


2 


13 


102 




11 




30 




4 


160 


First Hand 3 


1 


7 




5 




11 




2 


26 


4 


1 


1 7 
i / 




4 




84 




55 


161 


5 


1 


5 




1 




12 




73 




Total 


47 


140 




24 


140 




134 


485 


C. 


YEARS 












Proxy 


Response 






0 


i 


2 


3 


4 




5 


>5 


Missing 
or UK 














1 0 

1 


90 
7 


8 

32 


3 

2 


3 

3 


1 

0 




0 

0 


1 
0 


16 
7 


2 


8 


6 


11 


1 


2 




0 


1 


10 


3 

First Hand-^ 


1 


0 


5 


15 


2 




0 


4 


7 


Response 4 


1 


1 


1 


2 


3 




1 


2 


1 


5 


0 


0 


0 


2 


0 




5 


4 


4 


5 


2 


1 


2 


1 


1 




2 


163 


17 


Missing 5 
• or DK 


2 


1 


0 


0 




0 


0 


15 


Total 114 


50 


25 


27 


9 




8 


175 


77 
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Total 

122 
51 
39 
34 
12 
15 
189 
23 

485 



It is clear from this table that proxy responses are generally similar to first- 
hand responses. In Tables 2A and 2B , 65% and 61% of the rcspon. s respectively 
were identical for first-hand and proxy respondents. In Table 2C , 66% of the 



responses fall on the main diagonal. For both SPEAK and UNDERSTAND, there is a 
tendency for the Household Respondent to rate the Designated Respondent slightly 
higher than the Designated Respondent would rate himself. The mean first-hand 
rating for SPEAK is 3.03 while the mean proxy rating is 3.20. For UNDERSTAND 
the mean first-hand rating is 3,19 compared to 3.36 for the mean proxy rating. 
(These differences are both statistically reliable at p <,01). Such a tendency 
is not apparent in the YEARS variable, with means of 5,18 and 5.16 for the first- 
hand and proxy responses respectively. In fact., there were no statistically 
reliable tendencies for first-hand and proxy data to differ systeimtically from 
each other on any of the other MELF variables* Given the slightly higher 
ratings in proxy data, one would expect a correspondingly slight tendency for 
estimating fewer LESAs from proxy data than from first hand data. Assuming 
the discriminant function to be roughly normally distributed that difference 

would be about 27o. 
Summary 

Comparisons of the data elicited from Household Respondents (proxy data) 
and Designated Respondents (first hand data) lead to the follov7ing generalization 

1, On questions calling for specific information about a person's background 
(birth date, education, etc.), there were approximately 107o fewer responses given 
by Household Respondents than by Designated Respondents. Different interviewer 
instructions in the SIE should result in a smaller percentage of "Don't know" 
and "No Response" codes being transcribed. 

2. On all other MELP questions there was essentially complete data from 
Household Respondents. 
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3. The overall relationship of proxy data to FCTR V7as very similar to 
that of first-hand data. 

4. On SPEAK and UNDERSTAND there v;ere slight but significant tendencies for 
proxy ratings to be higher than first-hand ratings. This could lead to an under- 
estimation of LESAs from proxy data of about 2% relative to first-hand data. 
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XI. A Comparison of Monolingual and Bilingual Interviewers 



In the SIE, most of the interviewers will be able to speak and understand 
only English. Concern was expressed by both LGRs and technical consultants that 
this type of interviewer might be less effective in obtaining accurate information 
from a potential LESA individual than a bilingual interviewer (i.e., an individual 
who speaks English and the language of the respondent). As a result of this con- 
cern, a study was conducted during the field test to compare the effects of mono- 
lingual vs. bilingual interviewers on data collection results. 

Over all sites 101 interviewers were employed: 50 vjere monolingual (English) 
and 51 were bilingual, speaking both English and the language of the respondent, 
and were members of the respondents' ethnic groups. Within each site five pairs of 
interviewers were assigned to v;ork in five separate sub-areas of the site. Each 
pair consisted of one monolingual and one blllnp.ual interviewer* The interviewers 
were randomly selected to participate in the substudy and the sample cases assigned 
to each pair member were randomized » In the San Francisco site, only Chinese 
bilingual interviewers V7ere available. Thus, Other Asians did not participate in 
this study. 

Instructions to the interviewers for administering the census type questions 

(including the MELP questions) were as follows: All interviewing was to be carried 

out in English whenever possible. If communication with the respondent was too 

difficult or inaccurate, then one of tvjo courses of action was to be taken: 

1. Bilingual interviewers were to switch to the respondent's native language 
whenever necessary. 

.2. Monolingual (English-speaking) interviewers were to find another individ- 
ual, either in the household or from the neighborhood, who could act as 
translator. 

Of course, both the tests and the DORP were administered entirely in English. 
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Three different analyses were run on the data to compare bilingual vs. mono- 
lingual effects. Each of these will be described below. 

1. Production Data 

One way in which bilingual and monolingual interviewers could differ is in the 
number of interviews completed. In fact, several LGRs predicted in June that 
monolingual Anglo interviewers would be faced with a much higher rate of refusals 
to be interviewed than would interviewers who were members of the respondents* 
ethnic-linguistic group. Thus, the expectation was that with respect to gross 
quantity of data collected, bilingual interviewers would be more productive than 
monolingual interviewers. (It should be noted that monolinguals ' instructions were 
to find someone in the neighborhood to translate if communication V7ith all members 
of the household was insufficient to conduct the interview. Bilingual interviewers 
were instructed to conduct the interview in English whenever possible and to use 
the other language only v;hen absolutely necessary.) 

Table 1, reproduced from RTl's final report (their table IV. 4) sximmarizes 
various production statistics for monolingual and bilingual interviewers. Notice 
that this is for all interviewers, not just the ten in each location who v;ere 
matched with each other and it is for all respondents, both child and adult. There 
appear to be no large differences between the tx<io types of interviewers in terms 
of the number of respondents interviewed. In fact, the monolingual interviewers 
completed more (647o) interviews than did the bilingual intervievjers (617o). This 
discrepency is statistically significant for the Navajos and Chinese and also for 
all groups pooled together ( 2 -=0.49, 2.16 and 2.11, respectively). 

Refusal rates were lo\^; in all groups, and there were no significant differences 
between monolingual and bilingual interviewers in this regard. These results do 
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Table 1 



COMPARISON or DATA COLLECTION RESULTS FOR 
MONOLINGUAL km BILIKOUAL I^TERVI£WERS-^ 



No. of Interviewers^ 



Potential Respondents 
Assigned— 



Respondents Interviewed 
(Percent) 



Mono.. Bi . 



7T 



10 



Refused * 
(Percent) 



Other Nonrespondcr.4 
(Percent) 



A/ 



Total Nonrespondcnts 
(Percent) 



Total Hours Charged 



3/ 



Total Miles Driven 



6/ 



249 
(62%) 



15 



675 



A19 
(62%) 



11 

(3%) 



14^ 
(36%) 



155 
(38%) 



15 
(2%) 



Ki0';'!O 



Mcr.c J hi . I Mono. 



Ictol 



397 



260 
(65%) 



lA 



67A 



10 



533 



394 
(74%) 



(2%) 



2A1 
(36%) 



256 
(38%) 



1099 



Average Kours Per 
Interview 



Average Kiles Per 
Interview 



9412 



13554 



4.4 



37.8 



1S17 



130 
(33%) 



137 
(35%) 



11 
(2%) 



13 



4 39 



279 
(64%) 



(2%) 



232 
(34%) 



131 
(25%) 



245 
(36%) 



116?! ISIO 
1 



8106 



4.3 



32.4 



4.6 



31.2 



12973 



4*2 



139 
(26%) 



803 



470 
(59%) 



39 
(5%) 



3S9 



202 
(52%) 



50 



2137 



51 



:i77 



1373 
(64%) 



15 
(4%) 



152 
(35%) 



160 
(36%) 



294 
(37%) 



333 
(41%) 



1717 



15399 



4.4 



30.1 



4S.0 



1486 



5.3 



55.2 



1937 



172 
(44%) 



187 
(45%) 



980 



7088 



4,1 



15.1 



65 
(3%) 



699 
(33%) 



1331 
(61%) 



49 
(2%) 



797 
(37%) 



764 
(36%) 



5935 



43535 



4,9 



4.3 



6.0 



31.7 



846 
(39%) 



6093 



43137 



4.6 



32,4 



Figures in this table are based upon r^nual counts ar.c coT.putarions by interv^iewers and 
super\-isors and have not been verified by machine tabulations. 



^^All interviewers sDoke English. For purposed of this study, "Tncnolingual''^ ref e.rrec to 
interviewers who did not alsc speak the language of the respondent, vh^ie Dilingual 
interviewers did speak the respondent's language. 

^^n Miar.i and Fl Paso both children and adults were assigned to interviewers. In Arizona and 

San Francisco oulv children were assigned, since no adult 3ists wert obtained for tnese sites. 

Interviewers randcr.ly so3ectcd an adult iron each sar.ple child's househcic m these sites. 

For Arizona and San Francisco, therefore, the nur.ber of potential responocnts was twice the 
nuDbcr of sample children assigned. 

-^-'Examples of "other" nonrcspondents include cases where the sanple member had moved to another 
city- where the address was nonexistent; where the sa:.plc member could not oe contactec at 
home in the prescribed number of inteiA-iewer visits; where the sample r.,emDer was out o. tow.; 
or where he was sick, institutionalized, or otherwise unava:Lla:>le. 

5/ 



"^'Includes training time, 
—'includes mileage incurred in connection with training. 
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not support the predictions of the LGRs that monolingual intervie\^ers would have 
difficulty obtaining intervievjs . One factor that may have played a role here is 
that the monolinguals , as a group, had more prior experience in interviev^ing than 
did the bilinguals. Only 11 of the 51 bilinguals had interviewing experience prior 
to this project while 22 of the 50 monolinguals were experienced interviewers. 

2* Comparisons of MELP and Test Data as Gathered by Monolinguals and Bilinguals . 

A second v?ay in which monolingual and bilingual interviewers could differ 
was in the qua lity of the data they collected. In other words, were the responses 
to some MELP questions and/or test items biased by the language ability and ethnic 
group membership of the interviewer? To answer this question, the means of the 
various MELP variables and the test total scores vjere compared for the matched 
data of the five pairs of interviewers in each site. 

Child Data , Table 2, after Table V.32 of RTl's final report, gives the means for 
children and the results of t-tests on them. Out of 55 comparisons, there were 
only three that were significantly different. T\^?o of these occurred in El Paso: 
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Table 2 : (Reproduced from RTl's Report, Table V.32) Sample Means and Summary 
of t-tests on Monolingual Versus Bilingual Interviewer Means for Var- 
ious MELP questions and Test Total Score, Data for Children for Paired 
Interviewers Only. 

Interviex^er 



Variable 


Type 


Cubans 


Chicanes 


Nava ios 


Chinese 


Over Grou 


When 


Mono 
Biling 


ns 

1.76 


2.75„„ 

ns 

2.85 


2.98„„ 

ns 

3.00 


2.24n„ 
ns 

2.58 


2.49no 

nb 

2.50 


Speak 


Mono 
Biling 


3.38 „e 

ns 

3.23 


3.22^ 
3.76 


LLo 

3.66 


3-^Ons 

LLO 

3.65 


3^^6ns 
3.60 


Und 


Mono 
Biling 


3 . 66-,^ 
ns 

3.43 


3.45„„ 

ns 

3.69 


4.00^0 

ns 

3.89 


3.72no 

ns 

3.69 


3.71no 

no 

3.68 


Sib 


Mono 
Biling 


1.65 
1.65^^" 


1.95 
1.76'^" 


1.78 
1.86"" 


2.04 
2.15"' 


1.83 
1.84"' 


Frnd 


Mono 
Biling 


2.02 


2.23., 
1.87 


2 -13ns 
2.23 


2 ■56ns 
2.46 


2 •23ns 
2.14 


Hlang 


Mono 
Biling 


1.03 
1.00 


1.77 

ns 

1.75 


1.83 „ 

ns 

1.77 


ns 

1.46 


1.53„„ 

ns 

1.51 


Years 


Mono 
Biling 


2.22 


1.85 


3.34ns 
3.93 


2.84ns 
2.35 


2 •58ns 
2.54 


Birth 


Mono 
Biling 


65.9"' 


67.5 ^ 
67.5"' 


65.2 


67. 2„^ 
67.0"' 


66. 4„, 
66.4"' 


Grade 


Mono 
Biling 


4.90 


3.02 


5 •27ns 
5.30 


''^•l^ns 
3.85 


4.65* 
4.22 


Ped 


Mono 
Biling 


2 -85^3 
2.71 


2 -87^3 
3.00 


2 •89ns 
2.75 


3^68ns 
3.73 


2 •97ns 
3.08 


Test 


Mono 
Biling 


44.1 
41.6'^" 


41.9 
39.3'^' 


50.2 
50.1^" 


46.6 
49.8"^" 


45.6 
44.4"' 



Sample Mono 68 60 64 25 220 

Biling 51 55 44 26 186 

* = t-Test significant at .05 level. ns = t-Test not significant at .05 level. 
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(1) In response to the question "how well docs ♦ . . • speak English,'* 
respondents tended to give higher assessments when asked by a bilingual vs. a 
monolingual interviewer. 

(2) In response to the question 'H^hat language does .... speak 
to his friends'*, El Paso respondents claimed "English'^ slightly more often to 
monolingual than to bilingual interviewers. Only one overall comparison was 
significant: monolingual interviewers were told that their respondents were in a 
slightly higher grade than v?ere bilingual interviewers. This is evidenced by the 
"Grade comparison. The interpretation of this finding is relatively unclear for 
several reasons: 

1. Since overall completed interviews averaged less than tv;o-thirds of 
total assignments, the random assignment of interview loads to the 
members of each pair may not have been preserved in the completed inter- 
views ► Thus, it is possible that monolingual interviewers had a slight 
tendency not to complete interviews with children in the lower grades. 
However, it could also be that parents merely tend to report a liigher 
grade to monolingual interviewers . 

2. One would expect that higher values of GRADE would be accompanied by 
different BIRTH values, but such was not the case. 

3. The tendency was not replicated across groups in a consistent way. 
Finally, it should be noted that there \;ere no mean test score differences 

between those tests administered by monolingual and biliagvial interviev?ers * 

Adult Data. A similar comparison of means is presented in Table 3 for adult 
(first-hand) data. In this case, across all groups, monolingual-interviewed re- 
spondents scored significantly higher on the test than did bilingual-interviewed 
respondents. They also scored significantly higher on the SPEAK, UNDERSTAND, and 
INCOME variables. 
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Table 3 : Sample Means and Summary of t-Tests on Monolingual Versus Bilingual 
Interviewer Means for Various MELP Questions and the Total Test 
Score, Data for Adults and Paired Interviewers Only 

Group 



Variable 


Interviewer 
Type 


Cubans 


Chicanos 


Navajos 


Chinese 


Over 
Groups 


^flien 


Mono 
Biling 


1.55 
1.60^^ 


1.97 
2.14^"^ 




1.82 
1.95^^ 


2.19 

2 Qgns 


Speak 


Mono 
Biling 


2.57 
2.19^^ 


2.16 
2.51'"^ 


4.22^ 
* 

3.71 


3.00 
2.90'^^ 


3.16^^ 
** 

2.77 


Under- 
stand 


Mono 
Biling 


2.98. 
2.43 


2.18 
2.67^^" 


4.23 
3.88'"^ 


3.06 
2.95^"^ 


3.29. 
2.94 


Kid 


Mono 
Biling 


1.47 
l.Sl'^^ 


1.24 
1.53'^^ 


2.10 
1.88^^^'' 


1.65 
1.65^^^ 


1.66 
1.64^^^ 


Friend 


Mono 
Biling 


1.14 
1.19^^ 


1.11 
1.35'"^ 


2.11 
1.88^^^ 


1.53 


1.52 
1.46''^ 


Hlang 


Mono 
Biling 


1.06 
1.02"^^ 


1.18 
1.33^^^ 


2.05 
1.75^"^ 


1.47 
1.25'"^ 


1.47 
1.34''^ 


Years 


Mono 
Biling 


1.61^ 
0.77 


1.71 
1. ou 


9.16 
8.97^"^ 


5.06 
5. 20 


4.83 
3.81''^ 


News 


Mono 
Biling 


1.98 
2.15'"^ 


2.08 
2.02'^^ 


1.56 
1.78'"^ 


2.24 
2.20'^''' 


1.87 
2.03"^ 


Bifthf 


Mono 
Biling 


2.13 
1.74^^ 


3.16 
3.16^^" 


3.80 
3.81^"^ 


3.00 
3.35^" 


3.08 
2.84'"^ 


Grade 


Mono 
Biling 


11.51 
10.13'^^ 


8.00 
9.00"^^ 


10.36 
9.84^"^ 


10.94 
10.65''^ 


9.74 
9.41^^ 


Income 


Mono 
Biling 


2.23 
1.86'^^ 


1.71 
1.77^" 


2.38^ 
1.78 


2.29 
1.95'"^ 


2-19** 
1.86 


Test 


Mono 
Biling 


19.98^ 
14.21 


17.66 
14. 35^^^ 


39.23 
37.84"^^ 


26.41 
23.00 


21.12 


Sample 
Size 


Mono 
Biling 


51 
53 


38 
43 


61 
32 


17 
20 


173 
155 



* = t-test significant at .05 level 

t-test significant at .01 level 

ns= t-test not significant at .05 level 
flooded by decade 
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I„t...stlnsly. th.=e differences „ere not ™ir.o.ed fo. all s-ups individually 
...ept £or test scores. Thus, although there is a vague pattern evident which 
could he interpreted, it certainly is not definitive. Generally, respondents inter- 
„U.ed hy »o„olinsuals appear to he so™e..hat .ore competent in English and somewhat 
„cre affluent than those interviewed hy hilinguals. As with the child data, it .s 

..,i^o PT-P due to a response bias or a sampling 
impossible to tell whether the results are due 1 

, u .1 cnc -Ic: that individuals answer differently to 
bias In the former case the hypothesis is that inaivi 

^nolinguals than to hilinguals. while in the latter case one would assume that the 
difference lies in the people for who™ interviews were and were not completed; that 
,s. .onolinguals „ay co„plete.a higher proportion of interviews with respondents 

complete a higher proportion of interviews with respondents U„».ing little 

English and with small incomes. 

the extent that this is a viahle explanation, it is worth elahorating on its 
implications for the SIB. The principal reason for ..incompleted" interviews in the 
aeld test was that the individuals to he interviewed could not he found. Some- 
times the address was non-e.istent or the family was not .nown at the audress . In 
ether cases, the individual to he interviewed was temporarily out of the area or ^ 
had moved without leaving a forwarding address. To a large extent, an interviewer s 
rate of interview completions was a function of his or her ahiUty to .'trac. down 
the respondent. How monolingual and hilingual interviewers might have differed at 

^ J 1-rrplovant to the SIE 

this task in the field test is moot presently, and may be irrelevant 

an, case since the SIE interviewers will he assigned to addresses rather than specific 
people. This should minimise the non-response rate due to inahility to locate 
the appropriate respondents. 
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3« Performance of the Monolingual and Bilingual Data in a Discriminant Function 



As one final analysis, the child data collected by the matched monolingual and 
bilingual interviewers were placed in the discriminant function derived using list 
as criterion and recommended in Chapter VII. The resulting LESA and non-LESA 
categorizations were then matched against list categorizations for those children. 
The results are given in Table 4. Subsequent tests showed that none of the pairs 
of per cents differed significantly from one another. Thus, it can be concluded that 
there is little evidence of systematic effect of intervievjer type on LESA - non-LESA 
classification. 



Table 4 : Performance of list discriminant function when used on data collected by 
monolingual vs. bilingual interviewers: Child data, list as criterion. 



°U classified the same 
by MELP and list 

% classified LESA 
by List 

°U Classified LESA 
by MLP 

°L Bias 



Cubans 
Mono Bil 
79 67 

66 67 

72 76 
9 +15 



Chicanos 
Mono Bil 
83 84 

48 47 

58 56 
+ 21 +19 



Nava i OS 
Mono Bil 



64 



70 
0 



71 



70 61 



54 
-12 



Chinese 



80 



48 
+ 9 



73 



44 54 



42 
-21 



Overall 



Mono Bil Mono Bil 



78 

57 

63 
+11 



74 

56 

59 
+5 




4> Summary 

While this substudy did not show large differences in data collected by 
monolinguals and bilinguals, its design had two vjeaknesses relative to its impli- 
cations for the SIE: 

1. Monolinguals were generally more experienced at interviewing than 
bilinguals. Apparently, RTI did not match the five pairs in each 
site for experience. Therefore, we may be comparing data collected 
by experienced monolinguals with those collected by inexperienced 
bilinguals. ' 

2. The list sampling procedure resulted in only 60-65% response rate. 
Thus, the results of this study confound two factors: (a) differential 
skills in locating respondents, a skill not relevant to the SIE. 

(b) Differences in answers to MELP questions given by respondents to 
bilingual vs. monolingual interviewers. 
In view of these problems, the results of the monolingual-bilingual comparisons 
are not definitive in any sense. 
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Letter to Center for Applied Linguistics from National Center for Education 
Statistics requesting a proposal for research and development activities leading 
to a Measure of English Language Proficiency, 
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DEPARTMENT OF HEALTH, EDUCATION. AND WELFARE 



OFriCE OF EDUCATION 



Appendix 1 



WASHINGTON. D,C. 20202 



Dr. Rudolph C. Troike, Director 
Center for Applied Linguistics 
1611 North Kent Street 
Arlington, Virginia 22209 

Dear Dr. Troike: 

On behalf of the National Center for Education Statistics, this 
office would be pleased to receive from the Center for Applied 
Linguistics a technical proposal to develop a validated measure 
of the Census for use in its survey of children counted for 
purposes of Title I, ESEA. The Census Bureau litle I survey is 
mandated by P. L. 93-380, Sec. 822(A), and the survey of limited 
English-speaking ability among persons from non-English language 
backgrounds is mandated in Sec. 731(c)(1)(A) of the samo Public 
Law. Design specifications for the measure (s) to be developed 
may be found in the attachment. 

The due date for all final products for use by the Bureau of the 
Census is October 3, 1975. The final report to NCES incorporating 
all technical materials, full documentation, evidence of 
reliability and validity of the measures developed and tested, 
minutes of several advisory group meetings representing the 
linguistic, 'Vesearch," and ethnic communities, and all other 
products to be agreed upon mutually may be submitted at a later 
date, but not later than March 31, 1976. Submit each product 
first in (at least one) draft and allow the NCES up to five 
working days for review. Naturally, given the ''tight" dead- 
lines, you may expect much quicker response; NCES will iiave 
available at all times a project monitor and an associate to 
expedite its review. 

The technical proposal should contain the following: 

1^ Introduction. This should contain a concise discussion 

demonstrating your understanding of the problem of developing 
a measure of limited English-speaking ability acceptable to 
the Bureau of the Census. 
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Work plan. In this section of the proposal there are specific 
descriptions of hov/ you plan to design, establish and implement 
the development program on a task-by-task basis. The proposal 
should clearly state how you intend to proceed to identify 
and develop measurement alternatives, to design the "test sites", 
to arrange for development on-site, to compare and evaluate 
measure alternatives, and to document the recom,mended measure 
fully. The proposal must be exceedingly clear on how the Center 
for Applied Linguistics intends to work^with NCES to relate 
jointly to the Bureau of the Census to produce the specific 
products to be delivered for use by the Bureau of the Census. 
The proposal should show how the CAL would establish system 
evaluation criteria and parameters, obtain and use information 
required for evaluation of measures and arrive at recommendations. 
The technical proposal should demonstrate that the work plan 
v/ould produce a measuv^e with the desired properties and in a 
form (items 5 ratings, training materials, etc.) manifestly 
acceptable to the Bureau of the Census and the NCES. The plan 
should be comprehensive, going \/ell beyond the information 
contained in the statement of design specifications, A Pert 
chart or other comparable plan for outlining the essential 
steps to be conducted within the scope of this procuremiont, 
their approximate duration and products to be delivered should 
be included in this part of the proposal. 

Personnel , 

A. Vitae of all key professional project personnel. Specific 
qualifications related to the proposed project should be 
noted. Examples of previous work relevan t to this project 
by key personnel should be indicated (with identification 

of sponsor and monitor) and should be available upon request, 

B. Names, qualifications, and responsibilities of consultants 
and subcontractors. (CAL is encouraged to utilize as 
consultants minority professionals and as subcontractors 
minority-owed firms with special capabilities relevant to 
work in bilingual education. Also be certain to include 
in the staffing at least one mathematical statistician 

with experience designing studies or experiments for survey- 
related work. ) 

Management plan. The proposal shall include a detailed 
statement describing plans to organize, staff and manage the 
project. It is estimated that the equivalent of approximately 
four or five professional man-years of effort will be required, 
exclusive of the costs of producing videotapes and renting 
playback equipment in sufficient quantities (if indicated) 
and costs of convening advisory groups for the work to be 
carried out. 
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The plan should include a schedule by phase and tasks. An 
organizational chart should also be submitted indicating the 
relationship of the project team to the organization. The 
technical proposal should provide a staffing plan by phase 
and task with a table or chart showing each key individual 
or category of support staff to be employed on the project, 
descriptions of the tasks which each individual will perform, 
the periods of time during which each task will be performed, 
the number of person-days estimated for each individual for 
each task, and total for estimated person-days by individual 
and by task, (This same staffing plan should be included also 
in the separate cost proposal,) 

The separate cost proposal should repeat the staffing plan 
from the technical proposal, in identical format, and show 
the dollar cost for each individual for each assignment. 
Daily or hourly rates of pay for each person must be quoted. 
An itemized detailed budget is required, including documentation 
of the overhead costs. Costs for subcontracts included in the 
budget should be separately itemized. In addition, the costs 
and time estimated to be incurred for ADP personnel such as 
programming and computer analysts, should be identified by 
task. While the cost of the computer facility at DHEW will 
"be borne by the Government, CAL is requested to estimate the 
costs of the usage of the DHEW computer in terms of dollars 
or CPU minutes by phase. If the proposal suggests the use 
of an outside computer for the processing of the data collected 
in the field, the estimated cost should be specified. 

Because the time to develop the measure under this proposed 
procurement is rapidly running out, I would appreciate receiving 
your proposal at the earliest possible opportunity, but no later 
than close of business, Thursday, May 15, At that time we 
v/ill want 12 copies of the technical proposal and 3 copies 
of the cost proposal. Send them to me at Room 1077, 400 
Maryland Avenue, S, W, , Washington, D, C, 20202, 

This letter is not to be construed as a contract award nor 
vnll your response to this letter obligate the Government to 
make an av/ard to you on the basis of your proposal. 

If I may be of further assistance to you during the preparation 
of your proposal, please feel free to call me at 245-8530. 



Sincerely, 





Research,. Development dnd Statistics Branch 
Grant and Procurement /•lanagemont Division 



Attachment 
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Design specifications for MELP by Dr« Burton R. Fisher 
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DESIGN SPECIFICATIONS 
FOR A MEASURE 
OF 

LIMITED ENGLIS5I-SPSAKING ABILITY 
IN A NATIONAL SURVEf 



Prepared for 
National Center for Educational Statistics 

Education Division 
Department of Health , Sducaticn, and Velfare 

by 

Burton R. Fisher 

, 71? Knickerbocker Street 
Kadison, Wisconsin 55711 

April 1975 



The studies for tliis report were conducted p^orsuant to 
Contract No. POO-75-C^057 with the Office of I^ducation, 
U. S. Department of Health, Education, and Welfare. 
Contractors undertakinc c;uch projects under Government 
Bponsorship are encourrij;ed to express freely their 
professional judg.iient in the conduct of the project. 
Points of view or opinions stated do not, therefore, 
necessarily represent official Office of JiXiuciition or 
National Center for Educational Statistics position or 
policy. 



DESIGN SPr:ClFICATICNS YOJi A M?:ASU1?E 
OF LIKITiiD E^'uLISH-SP?:A^J:NG ABILITY 



Some General Houn<3?ri(?G 

1. VariouG DHUrf policy documents, and pages 1^8-l'+9 of the Conference Report on 
HR 69i make it clear that we are concerned witli rneauui^ement of ihclish language 
ability, BX\d not with language dorain£mce or with proficiency in any language other 
than EhgliGh. 

A second principal and fixed requirement, for a host of good reasons, is that 
the national survey of "limited English-speaking ability" (Li^A) mandated in Sec. 
731 (c) (1) (A) of Title VII ESEA be carried out by the Bureau of the Census for 
NCES. It will be "piggybacked" on Censuses large-scale national survey of the 
economic status of faiialies, mandated in Sec* 822 (a) of PL 95~380. 

One may not find the latter measurement context requirement optimal, but the 
function of the R S: D work for wliich design features ai-^e specified here is to find 
solutions within the constraints set forth, and not to raise problems. 

Some More Specific Boundarie s 

2. Census people say that if measurement of LiiSA ic to be carried out in the 
Census survey, at least four constraints must be observed. 

a. "Testing" in any overt form, identifiable by respondents as such, is defin- 
itely excluded J this applies especially to "paper-ajid-pencil" tests. This places 

a limit on the kinds of response-eliciting stimuli which can be used to get at Lr5A. 

b. Also categorically excluded is electronic recording of v/hat the respondent 
eays, for later analysis and coding. Th3,s places a limit on the l^idnds of responses 
to be recorded and the locus of assessment of these responses. 

c* A third explicit constraint: L£SA measurement procediu^es must not break 
rapport during the interview, must fit "naturally" into the context and content of 
a CPS-like interview (faci-to-f ace or via telephone), and must be witliin the capa- 
city of its usual CPS csi'l CPS-like interviewers. (On the whole, the latter are women 
35 - ^ years of age, v/ith a high school education.) Tl;o procedures must not disrupt 
them . 

d. The strong preference of the Census staff is fcr as simple a measure as is 
feasible, v.dth a small series of direct questions, ansv;erable by the usual rc-ispondent 
for the household about all of the other membcrd of the household. (In about 60>o 
of CPS interviews, this is the mother.) That is, the preference is for enur^ieration 
of the houseliold members, without sampling v.dthin the household to select the actual 
respondents. 

This is a strong Census preference, not an absolute requirement. V/hcther 
this preference can be gratified, given the need for an ac equal e measure of LZSA 
(a key llCi:S requirement), is an empirical questions to be answered in the course of 
R&D work. 

Some )>j:sign Specifications 
The LESA measurement (uirvey) wj 




^ connidcrations set fortli which enter into the spccifici;Lionc,have t}ie above constraint/^ 
and professional standards in mind. 

ERjC OA ■ ^^^^ ^^''^ AVAILABLt 



i/auexe: xnis ir not to oc a tent or " Jlmitnci LTigiich-spcclcing ability 
It is to be a "morinure of Knglish langusc^ proficioncy It is not to be a measui^c 
of Englioh lant^uogc competence or aptitutde for learning English; it is to be a 
measure of LVifjlish lan{;ungc performance and mastery, as they appear in a defined 
mcaGUremcnt situation. Let us call it KELP, for present purposes* It can have 
alternative forms . 

K Sec. 703 (^».) of Title VII ESE/l defines: *'The term 'limited Lnc;li6h--fipealdLng 
ability' I v/hon used vdth reference to an individual, means. . .^^individuals who '^liave 
difficulty Ppeald.ng and understanding instruction in the English language'^ because 
"they v/cre not born in th3 United States or whose native language is a language 
other than English'^ or beciiuse they ''t)iey come from rji environment where a language 
other than English is dominant. Further, ^^The term 'native lan-uage', when used 
v;ith reference to an individual of limited EngliGh--speaking ability, means the l£«ng~ 
uage normall used by such indiviauals, or in the case of a child, the language norm- 
ally used by the parents of the child. ^' 

Other references in PI 93-330 (to preschool education; to auxiliciry and 
supplementary programs for parents of LESA pupils; to elementary rmd secondary 
education; to bilingual education under the Adult, Vocational and Iligher ixJucation 
Acts), and the language of Sec. 731 (c) mandating this survey make it clear that 
the ^^individuals^' referred to above may be of any age. However, individuals aged 
5-17 seem to be of special interest. 

Furthermore, the definitions of "program of bilingual education" for Title 
VII ISI!A and the several educational Acts cited above indicate that the Congress 
holds tliat these programs of instruction are appropriate and necossaa^y because the 
LESA (of those whose native language is not ZLnglish or who come frcm foreign-language 
dominant environments) is a b arri er to the effective progr c-^ss of their education and 
training. It is primarily for these persons of LESJi that f ederally-suppori-ed pro- 
grams of bilingual education are intended. 

c. From the words ea\d sentences of PL 93-3?'0, the follov/ing interpretations amd 
inferences may be drawn: 

(1) In the survey, MELP is to be obtained only for persons of the defined 
demograpliic and language community characteristics. VPor the moment, we put aside 
consideration of whether or not compj:crinon data from those in groups defined by 
other characteristics ought to be obtained in the survey and/or KELP R&D work.) 

Tliis would involve a series of "screening" questions addressed to tlie 
usual respondent for the household in Census surveys. Furthermore, it may turn cut 
that is desirable practically to have these questions administered by unsolected 
Census interviewers for "screening" purposes, with a more complex version of hlLP 
(see below) later administered by a more higlily tr£-dned interviewer. As will be seen 
belov/, when the validity-^standardir.ation study is discussed, some of these questions 
and a fev/ additional simple questions may also be useful and more easily administered 
surrogates for VizILP in its more elaborated form. 

The formulation of these "screening" questions is not a simple ratter 
at all, and there is considerable controversy as to the nature of language questions 
in Census work.^ ( Sec Lieborson, I906, a>id others.) Under these circximstances , it 
would be highly desirable that tliis set of questions be prepared by the R £c D con- 
tractor in close association with Census people. Experience with the Bilingual 
Supplement to the July 1975 CPS sliould be helpful in tliis \^ork. 

(2) ML'LP is to be an indivi d ual measure, except for very young cldldrcn 
(where it is to be derived from tl^e "screening" items). It in an empirical question, 
for R&D work, as to whether tlie \)sual single Census informant about the members 
of the liousehold c;ui validly :;nd reliably provide KLLP data of sufficiently discrim,- 
inatory power about the household's ijiuivi duals. 

If the answer is "no", there will probably be need for sampling itnd 
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R & p work ;„J ,»i„i,r,izine the potonUul =-"f„fJ.;,i,„,,,„ t„,ical 

foar or puch awkw=r<ln==s i= 1 

drafters tove '^"i^^i'prc-tation uouW te that. 

Tho conseciU'^nccs oi thi.^ mlcri p„s=ntE>tio,i of printed 

C,) '■^'^^r-.n^^^'^fl:r^li^'S^XZ acoo»oaati..,o to a C.„=us 
materials for readioE ean be k„lA 

constraint. „ j^., individual speaks and no^ 

Ch) I£ direct questions abou ^ r^,^,,,^^ or to sor.eone else 

about '^»./lfi,^v;--r mderstanding behaviors "T^^^^^^^ul do not 
XlS^^^^'^^-l -t^r: 'rLr :"-.C'b';Uors -an assessment 
rian.ra.f S^or b. the respondent 

(c) Given the Census v=:to of '^-^''1^°''^ , exr^erts) , the interviewer 

on forms developed ror,earch and in ^ ^/u^pn tvor.cr..sr,fM^ ?y 

good psychological J''^^:^:^^ e.ociol a-^li- ^ "'-I'^.Thavior. dvring 

people without p™ 

l^rplntcrrctions a..d individual perfornanc.s, xn 

tions. . , ^.^ _c f^ccd vith a choice. It must 

(h^ Whatever form of HELP xs used, v,e .xc ,,/educational levels 

levels or for individuals oilK-ie. ....H-r. to ho ascert;aneo 

This is both a theoretiea, ane e^piric,,!^^^ ^^^^^^^ measurement 

during ^:^P P ^ » "r;:\au:ar=lf ;r t -d°for,n. 

characteristics is the ultimatelj , ,„teruretations, draim from the 

(5) „e continue with our i'^f"="r,?, :f"eif cation pun-oses. Thus, the 
m o',--'% and applied to our cesiyi -P':"';, j m.licatton in the 
'""f "^r-fructiin" n the deknitior. of LESA, -^^l^^'^ 'i„'t,„,:,cd to pro.ote 
rf?nili;ro^.il.n.u.a eauc^ 

effective educatxon ond ^^^"-'"-^ following concluro.on. ^'f • , alr,o b* 
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(a) On validity; HiILP ic to m^^asuro what it is intended to measure 
the charactorioticc and relative proficiency of »*cpoakin(; and undcrGtnnding 

lnc> trac tion in the Knc^-iGh lan^uace,*^ v/lnch nvtkc o difference or Could make a 
difference in the inJividual'c pro^^rons in a course of education or training* 
How "limited" K3A is, for prcGont purpoces, is to be referenced againGt the lant^u- 
ate performance of individualn whose Jv3As are seen by the schools as barriers of 
varying strcncth to effective learning, when instruction is in Kn^^lish* 

(b) This applies to individuals in (x;re6chool?) , elementary, secondary, 
postsccondary, adult and vocational education proyraiTiG. KELP validity studies 
ideally should be carried out in all of these contexts* 

(c) It should be recognized that different educational acencics (SSAs, 
LEAs), schools and programs use different mensurcvn and criteria (of different worth 
in tern^s of scientific standards) both of *£SA and of effective educational procress. 
The procedures for identifying indi\dcuals for whom LISA is in varying decrees a 
barrier to utilizing effectively instruction in /loglish v/ill thu.^ also differ* 

The R 8c D contractor may be able to uiake some choices arjong these educational sites, 
as to where MKLP devclopnental and validation studies should be carried out. (The 
modes of stratif icationVor a purposive? sample of sites in which to carry out such 
studies is left for later consideration by the K & D contractor*) 

(d) For both practical and theoretical reasons, v:e are not likely to 
arrive at a "true^' (essentially metaphysical) definition and measure of characteristic 
and degrees of KSA v;hich universally ou^it to facilitate or inVdbit educational 
attainment* V/e can obtain adrrdnistrati vc idontiii cations, in the schools as they ai^c 
and by the identification methoas they currently use, of individuals inhibited frc::i 
normal educational attcdnrncnt by Lr^SA* This is a ubiquitous problem in resec^^rch on 
exceptionaliticf. , and the approach sucs^^sted here echoes experiences derived from 
that research* 

(e) V;ere we to have a sufficiently 3 i-r^e and differentiated sar.ple of 
educational sites, from sub.-sarnplo data cculcl establish rcraonal, in.-tj tution^l 
characteristics and (witrdn the former -roupinrs) ace/srade level reference points 
for degrees of KLP related to probability or . ease/difficulty of effective educational 
progress* It is questionable v/hethcr, for the purposes of the present national^ 
survey, such differentiated standards are desirable ^- or even possible to obtain 

in an R 8c D stuJy of reasonable dimensions* From the IV£1.V data obtained at the total 
sample of educational sites in the validity study, and from their review by an export 
group, national '^cut-points^^ for LZ3A and MICLP could be established ~ for different 
age groups, at least. 

Estimating the numbers of LESA persons of various characteristics 
by S&\ or LSi or other bounda:tdes is an issue and a procedure separable from the 
question of r.eparate rerjional and other standrc^ds* 

The sites of MELP validation are simultaneously proposed as the sites of HB.P 
construction, particularly for v/hat we shall coll HELP'S elaborated form* The 
intention would bo to dew] op arJ'instriLiiOnt^'to measm^e KLP suited to the Census 
survey procedures while reasonably modifying them* Tiiis must always be ):ept in mind 

a. Specialists in applied linrjuintics have knowledge of the components and 
dimcnciona of phonology (accents, so\;nds, some dialect features), of le>d-con, of 
syntiix and of utterances to be used to c}ia7\ictorize oral production and aural com- 
prehension, (Parenthetically: bilinr-ual iritervi ewers or non-*verbal beliavioral 
resi^onse indicators may be necocsary, where an inciividual comprehends but does not 
spcc-k Knglish*) Applied Din^uists are aware of certain central ^Ulia^-nostic'* lin^^- 
uistic featur.M; of adequate rnd inadeqiiatc Enr^linh l:ni:ua^-;e usa^-o and co-prehension. 
If they do not alrti/idy know wiiich of these linguistic fcc-.turos are most hi-hly 
correlated wil.li otlicr f<*atures of i::n,rlioh l<;.n::;ua j;t; ur.ai/e, th^y can dcLcrmjno 
empirically in K £: D vork at the educational site/;, (The puvyor:^ of this is to 
ehort^Mi the list of l:\\h/MinZ^ behaviors to be obi.:erved , f or ent<a-infi into an assess-- 
nient of KLP made by trained interviewrrs* The aim is practical while maintaining: 



a list of critical itomo long er\oxxza for MKLP reliability • ) 
b* The next ctcps v/ould bo to prepare: 

(1) tentative "ordinary qucstion^^ unobtrusive cl:andiird stimuli, likely to 
elicit the speech production features to be obccrvcd in oral production responses; 

(2) when these linguistic features ^ire used in the stc^ndard stimuli, they 
bring forth overt bc?havioro or cpcech in English (assuming intci-/iewers arc not 
bilingual) indicating aural co.-nprchension^ 

c# Tentative observed l8n|7uage behavior recording and rating/assessment record- 
ing forms (cueing the intervicv/cr as to what kinds of b-haviors to. observe) v;ould 
be prepared, 'Ihese forms are likely to contain some combination of sets of qualiy 
tative categories, ordinally ordered categories, and continuous (but actually ordi.nal) 
'^scales J* 

d. Tentative elicitins stimuli and response reporting/rating techni.ques would 
be applied »'blind»» by the R S: D tecum c\t the valif3ntion sites, to individuals admin- 
istratively designated by the schools as functioning with varying degrees of, LLSA 
(including zero) which interferes with education end training to varying extents. 
Various selected age/grr.ce-lL^vel c-nd different dominant non-English language in.Iiv- 
iduals would be given these ir.easures, the findings being treated sep^u^ately at 
least in this try-out stage. 

The key matter tc. be ascertained is how v/ell which elen:ents of the tentative 
KELP discrirdnate among the categories of LL3A-identified individuals, A summary 
KELP '^score" for oral production, and another one for aural comprehension v/culd be 
derived snd validated as above. It may even be possible to develop ''scores^' for 
finer features or sets of features of the individual's loiiguage behavior. 

(1) The individuals discussed above should be +ho£e defined in PL 9J?-yS0 
■ as possessing the specified Jefrogreicliic/lrnvu-.ge cor.munity "screening" characteris- 
tics. One check on elaborated KiuLP v/ould be to iivyly it to individuals who lack 
these characteristics (e.g., born in U. S. English raonolinguals whose parents speak 
only Eiiglish), in the s:?nie sites. 

(2) These procedures, improved in successive trials, would in later stages 
be employed v.lth observer/raters who are Census-type interviewers trained by the 
R 8c D team as it develops its training operations. The stopping-point for E u D 
work would be signaled when a valid MELP relatively adequately meeting psychometric 
standards .for intra- and intor-interviewer reliability and discriminatory power has 
been developed. How finely KiLP should discriminate the quality of English language 
oral and aux-^al mastery is left open; as a prc/ctical i.-attcr, it will probably bo 
critical that the finest and most reliable discriminations be made in the central 
rc-inge of KrXP "scores", where instructi on-^inhibiting xISA transits to instruction-- 
barring KSA. 

(3) The elaborated KHLF version thus developed (and alternative versions) 
would receive their final validation in education cjjid training sites of the v<irious 
kinds other than those utilized for Kl-lP developmc.nt . The reasons for this are 
obvious . 

ih) Finally, the developed versions of MELP must be y.retested in the field, 
in a realistic CPS-like context in cooperation with the lureau of the Census — and 
revised as is necessary. If Mil.P has altern:<tuve versions, this in the op]»ortunity 
to gain information as to which version is the "best" or "least bad*' under the 
simulated conditions of t)ie national survey. 

c. On training of Census- type interview'^rs for using HLLP: It will be necessary 
to prepare P c/ D interviev/er trainijig materials suitable for letter relatively stand-- 
ardized training of C^rtisus field staff (re^'.ul^^i^ or siK^cislly recruited) d\jrii:g a . 
compjiratively i>liort training p^-^riod carried out at dioi»ersc-d locoitions. (I.'ote: Thft 
CPS interviewer field r.taff meets for trai);ing at sevc]-al cenbvil location.^ «»aoh 
y month. Since the national survey will extend over several Cionths, tliere slK>uld b^, 
fctyj^ opportunity for intervieuvr retraining ansl t]\iining reJ uf orcement , ) 
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For thin purpose, videotaping of beh«'^viorn, m;ide during; the validation 
study — or pcjih^^pr, by profcs^jicnal actors followin^^ ccripts — cmd vidcoti^po 
casette reproduction is proposed • Wh:it v/ould be required ^ nmonr other thiufjSt are: 

(1) Videotaped cxomplcs of a variety of lan-;uage behaviors, clearly dis- 
playing the lini^uistic foaturcs> to be observed in oral production and the indicators 
Of aia^^al comprchencion - — whether the latter be non-verbal action, or non--En[.-li sh 
speech addressed to a bilin^;u'^l intervicv/er , or a response in En^dii-di. Accompanying 
each Gicht-sound exarnple v/ould be a diilactic discussion of wliat features have been 
displayed, how tiicy are to be cate.^';ori::od and assessed, what they must not be con- 
fused with, etc. The exf^nplcs v;ould show inoividuals of various ages and various 

hi (;h- frequency Enrlish deficits and accomplishments — whose primary lanr^uaf^e is 
not En£;lish. 

(2) Videotaped exercj.ses — relatively discrete segments of oral production 
and aural compreliension behaviors, follo\/ed by full JIEU^ field interviews, would be 
shown. The trainees would bo ac-.ked to twokc their catc;-?orizations or ratings or 
other assess.T.ents (including a "rjlobal" a.vsesj^nient of L5iP) on tlie standard forms. 
The trainer v;ould then (dvt? the "correct^' ansv/ers and how they were arrived at -~ 
all still on videotape. A trainer v;ould be avai.lable in person to answer questions 
and to receive '^feecback" fro:n the trainees* Both I!rXP and the specifications 

for its field adrr.inistration , as well as the training program, can profit frojn such 
"feedback^' — if experience is to be our guide. 

(5) It is possible that an entire training session presented on videotape 
for the trainees to observe could have unique training value, in addition to the r;ore 
active processes described above. We aa^o fainiliar with '^sing-alcngs" ; wliy not a 
"ma^.sure-alcng*^? 

(h) The selection a.nd preparation of material, pretesting and other activ- 
ities in connection with the training program constitute an R cc D study of itself. 
Again, advice and cooperation from Hureau of the Census personnel seeai called for* 

(5) Accompanying the preparation of videotaped training materials is the 
preparation and pretesting of clear \'rri ttnn instructions for H/iP use, v;hich the 
interviewers can refer to in Vna field. (A toll-free number for the inter\T.ewer to 
call for advice, if she meets with difficulties in using KiiXP, would not be aniss.) 
In a sense, the interviewer * s task then is to cornpra^e and assess actual respondent 
behaviors against reference standards and exaxnples learned in training and . described 
in the v/ritten iiistructions* 

5» The developfnent of one or more versions of the elaborated KELP described above 
is intended to produce the linguistically and psychcriietrically "best" perf ormcUice 
measure of English language profici:ncy tied in with educational pcrf orniajice — 
onft v/hose quality cind relevance v/ill be legitimated by professional and public opinion. 

On the other hand, it is reasonsble to ask: Are there other measures wl-dch can 
be developed, psychornetrically relatively respectable, correlating relatively liighly 
v/ith both elaborited KiuLP and the validity criterion, v;hich possess certain advantages 
over elaborated ML'bP? Arriong these advantages nigiit be: considerably shorter ojid 
less complex interviewer training required, no need for bi3.ingual intcrvicv;ers, less 
intervicrv/ time cons\imed, less jiotential int(?rviev; disruption, simpler data jirocessin^:, 
and in general less trouble for the Bureau of the Cencus and its sm^vey operations. 

That is, can we develo]) measures sirripler than el.aborated MiXrP which ar^ technic- 
«^lly ^*good enough"? Can we trade off some teclinica] qualj.ty and quaritity of informa- 
tion for much greater opr.rationc:! ease, vnd still iiave a suf fici^':ntly rcliabl<f:, 
valid and useful MKLP? There is not con;i*lfcte assurance tliat a technically adequatf: 
elaborated KiJLP acceptable to the bureau of tlic Census* can be develo]>ed; there- is 
a good chance of success in these i^espects. T)ie issues raised r.bovc are r-^ally 
cmpj ricj] questions, co be fjn.sw<»r<-d in R f< D work. In any case, ela\.»orated K:iXP 
'must be there if tlic answers to tliese empirical questions uTo in tlic negative. 

What more-or-lens cumulative set of simpler measur<rnient apj^roachcs might be 
explored? 



a. Extcndinc the r.inpe of "screeninc" qucBLions to include aRCertaininp; the 
«oc<:iblc use of Enj^lich in various doniairm of lanpu.^ie "so (home, F-e^r,, work, etc.; 
and for various comir.mii cation functions (c.g-, r^dio and TV listcninc to i^ij-Uoh-^ 
lanruare etatioua, rc aural comprehension). These questions would probably be puc 
to the 'usual CPS ro^ondcnt f o_r tlie entire household and about its individual 
members, and could include items on specific kinde of difficulties individuals 
ini ^ht have in oral production and aural comprehension. 

b. Ratings of household rr,en-,bers, individually, on how well they speak and how 
weirthey understand ijiglish speech, made by the sinr.le respondent for the entire 
houseliold. 

c. Itene equivalent to a. and b, above, where the respondent reports about 

and rates himself or he rself ; the individuals have been selected by within-household 
sampling TTh^i^ is even sonic point in a lOO/i san-,plc of the household "cluster , 
where the household was itself selected in a probability sajiiple — though tins would 
pose some practical problems.) 

d. CPS-tyj.e interviewers, with short and simple training, categorize/rate the 
respond-nt on how well the person speaks and how well he understfxnds r.nglieh — and 
possiblv whether his ELP is sufficient to effectively utilize an age -appropriate 
ell^tl^nal or training opportunity. In the normal course of aji interview, the 
interviewer has had an opportunity to observe the language behavior oi the responden., 
and is supplied with appropriately cued reporting forms. She can ask direct questions. 

e. In R 8: D work, it may be feasible to obtain a variety of demographic and 
languare characteristics of the respondent who rates cuid categorizes persons within 
her household, and siniilar data about the interviewer. From these data, and. tne 
corresponding sirr.rlc and elaborated KELP data, an appropriate "correction factor 
might be applied to the results of the simpler KKLP version to decently estimate 
what that neasure's value would be on elaborated MH,?. 

f. Some ccn-.bination of a. to c, abovf . 

6 A rath.-'r different ar.proach would be to ascertain si:r,ply-obtained predictors of _ 
the individual's elaborated KLXP status and/or predictors of adrrdnistrativoly laenvi- 
field LESA status at the validation sites. Some of t}ie predictors -Turht be the 
"screening" question responses of the informant for the household's members; others 
might b- of the kind suggested in 5- a. to d. above. Still others migiit be the usual 
Census demographic data on houseliold r.err.bers and data on the househola as a uiiit. 

A multiT>le regression eauation, whose regi^essors are obtainable in a household inter- 
view of' the CPS variety, yielding reliable and accurate esti.T.atcs of tne elaboratea 
K.iSLP or LESA status dependent variables, would be tlie goal. Tliis could be one of 
the distinctive tasks of R D work. 

7 Th-- }\ELV produced in R&D work should as far as is possible meet the technical 
and other criteria set forth in the 197^1 rovisDon of St-andards for Educational and 
Pr.ycholordcra 'lIlSi-Q- would be beyond the function of this design statci:.'jnt to 
rehearse tliese st;.nd;a-ds. 

8. The R D team is envisioned as being cor.posed of sp.ecialists experienced in 
appliud linguistics, in several aspects of psychonietri cs , in educational practices 
concerning LJ^A students, and in survey work as c6nducted by the bureau of the Censur.. 
(A Census professjonal as liaison p<>rson with the R &• D tewn is a i.dninium requirement. 
As far as possible, the staff should include members of the major language corr..r.unitir-:; 

9. Close association with tlie Bureau cf the Census is empliasized for a series of 
reasons wliich affect the appropriate form for HELP. 

a. Census people indicate that they have greater freedom of action with respect 
to interviews at households included in tAipplementary camplrs, comp3r^d with the 
. constraints on interviown at regular CPS j'anel housMiolds. Tld;-. fr--ater f] oxibila.ty 
pertains to interview content and ),roceduro,s, and to th.e }'.o;-,sibili ty of within- 
hom:..hold r'-si--ondent selection. There wi ll bf coi.t,i nr-Mici .-s iu th-r somp] o n- j.l.-.n for 



to the needs of the national curvey. These contingencies will have implic- 

ations for the MKLP form Ufied. 

b. Another conlinf;,cncy is the language competence of Census intervicwere. 
For the very large-ncaie surveys under diccucision here, NCES has b'-en given to 
understand that additional interviewers will be hi.rod. It is apparently not an 
entirely cloeed quention as to whether bilin^uals can or will be specified for 
liire. Census could also be asked to ascertain, how rr.any of its current interviewers 
are bilingual, in what lanfiuages other than English, and where located geographic- 
ally. The bilingual interviewer permits a simpler form of the measure of aural 
comprehension of 'English (while posing some problems in the accuracy of asscsGment 
of oral production in English. Further, should Census specify thnt a certain 
proportion of the interviews be conducted vDVa telephone, bilingual interviewers 
become even more essential. For HELP activity, face-to-face interviews are greatly 
to be desired. 

10. PL 93-380 provides an excellent roster of the many kinds of public and^ 
professional constituencies intr:rested in the national survey of LESA, and its 
implications for bilingual education planning and progr^xms. The communities of 
linguists and psychometricians are also involved, /ill of these groups, in some 
advisory capacity to NCES (and by extension to the R & D contractor) can provide 
the kinds of legitimations helpful to acceptance of both KELP and the national 
BVirvey. 
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Project Narrative 

The following notes sunmiarize tlie principal activities of the project during 
each of its phases: 

1. Instrument Development and Refinement (Chapter III): 

Juno 2 >^ 8 : Stolz and Troike went to Sr.n Francisco to meet Ms. Minerva Mendoza- 
Friedman to recruit her as the project's San Francisco coordinator* They also met 
with Harold Yee, president of Asian, Inc,, who advised them on renting office space 
and making contacts in the various ethnic communities • In addition, tl.ey met with 
Ms. Teresa Chen and Prof. Susan Ervin -Tripp, both of University of California - 
Berkeley to initiate recruiting efforts for research assistants and junior research 
assistants . 

Strick and Jones reviewed possible assessment instruments in Arlington, and 
recruiting of LGRs and the planning of the first LGR meeting continued. 
June 9-15: The San Francisco office was established and nine research assistants 
began work on June 12. Strick took charge of developing discrete point tests, and 
contacts with the local ethnic communities v;ere established to begin recruitment of 
households in V7hich to try out various instruments. Initial versions of instruments 
were produced. On June 9, Stolz and Strick consulted with Dr. Charles Herbert of 
Chess and Assoc., author of the Basic Inventory of Natural Language, about the pos- 
sibility of using the BINL as a criterion instrument. 

Initial meetings of the LGRs were held in Arlington June 10 - 19. The schedule 
was as f o11ova?s ; 

June 10 - 11 Spanish Speakex-s 

June 12 - 13 Native Americans 

June 14 - 15 Chinese 
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June 16 - 17 



Asian/Pacific Groux> 



June 18 - 19 



Kuropeans 



The agenda and proceedings arc attached to tliis report. Generally, each group 
\\ias oriented to the project and the SIE. They revievjed some tentative ins traments 
for assessing English proficiency. They made suggestions about specific iiistrunients 
and/or items that they thought would or v?ould not \^ork in tlicir groups • They also 
recommended various interviewing tccliniques. Representatives of NCES and RTI were 
present. 

On June 13, Roger Shuy, Director of Domestic Programs at CAL, briefed the 
Federal Interagency Language Roundtable on the project. 

June 16 ~ 22 : Leslie Silverman, Project Monitor for NCES, and Michael Weeks, Direc- 
tor of interviewer training for RTI, joined the San Francisco staff and began an 
extended discussion of the field test design V7hich lasted essentially the entire 
week. Silverman and Stolz met with Harold Yee who suggested that the validation 
of instruments be carried out within a ^*lcno\gn groups" design using pre-identif ied 
LESA and non-LESA samples. This notion V7as carried back to the design meetings and 
formed the basis of most of the discussion. On June 18, Dr. John Upshur of the 
University of Michigan joined the group as a specialist in testing language profi-- 
ciency. He had been a consultant during the writing of the proposal. On June 19, 
Troike and Burton Fisher arrived and joined the discussion. 

During this time the research assistants continued to test preliminary versions 
of discrete point tests in the three ethnic communities. Also, a number of junior 
research assistants were recruited. 

In Arlington, Dr. Jeanne Freeman of CAL began developing a behavior observation 
system for use by monitors in observing interviewer-respondent interactions during 
the field test. 
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June 23 ■ 29 : 'I\v»elve junior research assistants joined the staff, and intervic\<!- 
ing using trial instruments began in earnest. The staff divided itself into groups 
with each group concentrating on the development of a particular instrument. On 
June 27 Stolz and Troike held a briefing for Federal Education Community representa- 
tives in Arlington. 

June 30 - July 6 : Upshur returned to take temporary charge of the San Francisco 
activities x-;hile Stolz vas not on site. Silverman also returned and began \^?orking 
with a group of research assistants on drafts of the ^^ELP questions. On June 30 and 
July 1, a group of language assessment specialists composed of Ms. Clandia Wilds 
of Washington, D. C. (creator of the FSI Oral Interview), Protase Woodford of Edu- 
cational Testing Service, Edward D'Avila of Bilingual Children's Television, Dr. 
Evelyn Hatch of U.C.L.A., and Sidney Sake of Defense Language Institute reviewed 
the progress of instrument development to data and made the suggestion that addition- 
al effort be placed on the development of a direct interviewer -rating system for 
use in the field test. Stolz returned to San Francisco on July 2 and began the 
development of the Direct Observation Rating Procedure (DORP). 

Freeman came to San Francisco to begin testing of the monitoring system. 
July 7 " 12 ; The San Francisco activities centered on: 

1. Development of the DORP 

2. Analyzing data collected using trial versions of various instruments, 
with subsequent elimination of poor items or entire tests. 

3. Preparing "final" versions of the MELP questions, discrete point 
tests, and DORP for review by 0MB and the LGRs at their second meeting. 

4. Training staff on the monitoring system using videotapes of inter- 
views recorded earlier in the week. 

July 13-18 : On July 13 - 14 the second LGR meeting was held in San Francisco. 
LGRs were briefed on the progress of the project and then given copies of the 
instruments. Members of the project staff role-played interviews with LGRs to 



familiarize them with the matcirials and procedures. Feedback, criticisms, etc. 
were solicited frora each LGR. Representatives of NCES, Census, and RTI were in 
attendance. 

The San Francisco operation was then shut do\m and all field-tc:3t materials 
underwent final reworking by Strick and RTI*s staff to prepare for training RTI 
supervisory personnel on July 18. 

2. Field Testing the Instruments (Chapter IV): 

July 22 24 ; Interviewer training in El Paso and Miami 
July_ 25 - Au?2;ust 16 ; Data collection in El Paso and Kiaiui 
July 29 - 31 ; Interviewer training in Arizona and San Francisco 
August 1 - 23 ; Data collection in Arizona and San Francisco 

3. Data Analysis: 

September 3 4 r LGR Meeting #3^ Arlington, Va . 

Septembei" 22 ^ 24 : A conference of experts was held in Arlington to choose the 
questions to be recoinmended as the MELP questions. (Chapter V) 

October 2 : A memorandum was delivered to NOES recommending the set of questions 
to be used in the SIE as the MELP. The memo did not deal with the question of 
how to map responses to the questions on to LESA and non-LESA categories. 
October 3 March 30, 1976 : Statistical Analyses were done focused on the produc- 
tion of scoring keys for converting answers to MELP questions into LESA and non- 
LESA categorizations (Chapters VI and VII) . 

March 30 : Contract extended to June 15, 1976 at no additional cost to the govern- 
ment. 

April 5-6 ; Conference of specialists to consider recommendations for additional 
activities to recalibrate and/or revalidate the MELP, using data collected in tlie 
SIE. (Chapter XI) 



Participants included : 



Dr. John Carroll- University of North Carolina 

Harold Yee- Asian, Inc., San Francisco 

Rosa Inclan- Dade County Public Schools 

Burton Fisher- University of Wisconsin 

Dr. Daniel Horvitz- R.T.I. 

Dr. Tyler Hartwell- R.T.I. 

Leslie Silverman- NCES 

Dr. Dorothy Waggoner- NCES 

John Convay- NCES 

Dr. Lepa Tomic- O.C.R. 

Roy Rodrigucs- O.C.R. 

Carter Rolling- N.I.E. 

Michael Rand- Bureau of the Census 

Marvin Thompson- Bureau of the Census 

A report of that meeting is appended to this report. 

April 1-30 ; Analysis of bilingual-monolingual intervie\v?er effects and first- 
hand versus proxy responses to MELP questions. (Chapters IX and X) 
April 22 : Presentation of prelirainary MELP project results to American Education- 
al Research Arisociation (this discemination activity vas not supported by Govern- 
ment funds). 

May 1 - June 15 : Preparation of final report. 
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Accounts of Ts^ork \\\ith Discrete-Point Criterion Measures which vjcrc not Included 
in the Final Tost. 



Tests consic^ered ]>ut not field tested . 1, The Bilingual Syntax Measure (Burt, 
Dulay and Hernandez -Chavez , 1974) was considered for a test of production . It 
was developed to test a child *s oral proficiency in English and is an example of 
a discrete point, indirect test. The child is sho^^n several cartoon-like pictures 
and asked a series of questions about them. The questions are constructed to 
elicit specific grammatical structures by the child. There are 25 items on the 
test, and it takes 10 to 15 minutes to administer. The scoring is very simple: 
one simply counts the number of grammatically correct ansvjers . 

Although the test has many good features, it v;as not further considered for 
t\<!0 reasons. First, it vjas not applicable to children over 9 years. Second, 
the test would have been relatively expensive to use. (The retail price of the 
kits would have been over $4000.) 

2« Dailey Facility Test. This test (Dailey, 1968) was also coxisidered for 

a test of oral production for children. It is not a discrete point test, but rather 

an integrative direct test. The child is sho\^n a series of pictures (representing 

different domains - school, home, playground) and asked to tell a story based on 

each picture. There is no time limit. The stories are recorded o Later a rating of 

0 to 9 is given to the story. The follo\^7ing is a description of these ratings. 

9 ....A well-organized story V7ith imagination and creativity. Need 
not be original. May use v?ell-knots?n fictional or historical 
characters . 

8 ....A complete story, but not a well organized one. 



ERLC 



r 



7 ....An interprotatioa of some elements of implied action or inten- 
tions, as deduced from or suggested by the picture but not a 
complete story. 

6 ....A detailed description of \?hat is happening, but nothing about 
past or future action or intentions. At level 6 all or nearly 
all of the elements of tlie picture will be covered, in contrast 
to level 5 vliere only some selected elements will be covered. 

5 ....A partial description consisting of two or more sentences vjith 
some description of movement or action as seen in tVie picture. 

4 ....TC'jo or more sentences describing persons or objects but no verb 
of action or indication of interaction betvjeen a person and an 
object . 

3 •...A complete sentence that makes sense. 

2 ....Compound responses, two or more words at a time, a single \7ord 
describing action, or more than one single-^noun response. 

l-....One single-noun response. 

0 ....No response garbled speech, or only pointing at picture. 

The test vjas dropped from furtlier consideration for two reasons. First, the 
pictures were unsuitable: many were culturally biased; others v:ere too sophis- 
ticated for children. Second, the rating system was too ambiguous. It was felt 
that it could not be used reliably without much interviewer training and further 
development of the scoring system. 

3. The Basic Inventory of Natural Language (BINL) . This test was considered 
for a test of oral production. It was developed by Charles Herbert (1975) to mea- 
sure a child's oral language dominance and proficiency. Children are trained to 
tell stories (based on a set of visual materials) to their peers. The stories are 
recorded and later transcribed. A set of 10 utterances are then selected for analy- 
sis. They are scored for fluency (the average number of words per utterance) and 
syntactic complexity (different weights are given to utterances with full sentences, 
partial sentences, phrases, and clauses). Tlie test thus falls into the discrete 
point direct category. 
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Although the cllcitatioa technique used in this procedure was very appealing, 
the test was not used for pilot-testing for two reasons. First, it could not be 
assured that there alvjays would be a second child in the household to whom the 
target could narrate a story • Second, it was felt that the scoring procedure 
lacked clear face validity. 

Measures vjhich were piloted and then dropped . As explained above, these were tests 
that were fielded in San Francisco, and then completely eliminated from the bat- 
tery. There were tv7o such tests. 

1. VJord Nam ins * This test was developed by Fishmau, Cooper, and Ma (1971) 
to measure bilingual proficiency and is an integrative indirect measure. Basically 
the respondent v;as asked to name as many different words as possible which were 
found in a particular domain. For example, he was given 1 minute to name in English 
objects found in the home. Other domains v;ere school, neighborhood, and work. This 
procedure was also repeated in Spanisli. Fishman found hi^h correlations bet\'7een 
the number of words given and the most frequently used language in the home. There 
was also a high negative correlation between the number of English words and a 
Spanish literacy factor. 

The test was adapted in the follov;ing ways for our purposes. It was used as 
a test of oral production and was only given in English. It was administered to 
both children and adults. Each respondent was asked to name objects in 3 domains. 
Adults were asked to name objects found at home, in the neighborhood and at work. 
They were given one minute for each domain. Similarly, children were asked to 
name objects found at home, in the neighborhood and at school. The score for each 
respondent was the total number of different and contextually appropriate object 
names (see Appendix 11 for instructions and questions). . 
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The test was dropped from tlie battery because of the difficulty of control- 
ling the. testing s5-tuation. Tliat is, it was found that the subject would often 
look around the room \7iiere he was tested and name the objects present. Thus 
scores v;ere a function of the "business" of tlie room in which the subject was 
tested. Because not enougli time was available to modify the technique, or to 
standardize the situation, the test was eliminated from the battery. 

2. ETvS Listening: Comprehension Test . This unpublished test was originally 
developed by ETS for the Puerto Rican Ministcry of Education to test students' 
level of achievement of certain curriculum materials. As will be seen CAL adapted 
this test to measure English receptive and productive ability in children and 
adults . 

The test had four levels: 

Level 1 was given to children in grades 1-3. 
Level 2 was given to children in grades A -6* 
Level 3 was given to cliildrcn in grades 7-9. 
Level 4 V7as given to students in grades 10 and above. 

Levels 1,2, and 3, had two sections. In Part 1 the subject was sho\i?n 4 pic- 
tures. The examiner said a sentence (e.g. There is a spoon on the table ) and 
asked the subject to point to the best picture. In Part 2, the subject V7as shox-m 
4 pictures and read a short passage. The examiner then asked him a question about 
the passage. The subject was required to point to the most appropriate picture (e.g. 
A boy broke Jane's bicycle . ]]er father fixed it, and she helped him by handing him 
the tools he needed > What was broken ? ) 

Level 4 only had one section which corresponded to Part I described above. 
Each test had the foliating number of items. 



Part 1 



Part 2 



Total 




Level 1 
Level 2 
Level 3 
Level 4 



50 
45 
50 
70 



none 



10 
10 
20 



60 
55 
70 
70 
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The total score was simply the number of correctly identified pictures. Tlie 
test was clearly a discrete point indirect one. 

In the San Francisco pilot work CAL niadc the foliating major modification. 
For each item described above, not only was the respondent required to point to 
the appropriate picture but he. also liad to say the ansv7er. In tlie case of Part I, 
this meant repeating the sentence said by the examiner. In Part II, the respondent 
was required to verbally answer the question. Thus each item vjas scored for infor- 
mation and grammar. In Part 1 a number of crucial structures v^ere identified in 
each sentence. If those V7ere correctly repeated the subject would receive a point. 
The number of structures varied from sentence to sentence, some liad one (the boy 
hit the ball), some had more ( That boy v;ants to play baseball) . A point v;as given 
for each correctly repeated target structure. In case the response V7as a totally 
grammatical alternate, the respondent \7as given only one point in addition to the 
possible point for identification. In part 2, a point for correct gramjuar was 
given only if the information in the sentence v?as correct as v^ell. The answers did 
not have to be complete sentences. 

The test was given to both children and adults. Forms were selected by age 
rather than grade, thus if a 20 year-old subject only had a grade 5 level education 
he was given Level h rather than Level 2. 

As the pilot work progressed, items were eliminated from the tests when they 
appeared to be culturally inappropriate or did not discriminate good from poor 
speakers. (See Appendix 11 for various forms and developments of the test). 

Eventually the test was entirely eliminated. The pre-emptive reason was that 
CAL had to receive permission from the Puerto Rican government in order to use it. 
This process would have been too lengthy and complicated. There x^ere also other 
problems v73th the test: each level was too long; the scoring of the production part 
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was troublesome. That is, a respondent might correctly r-epcat the target struc- 
tures, but make mistakes in other parts of the sentence and still receive a perfect 
score. 
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The Window Rock Data 

We have already mentioned that the school lists were constructed in VJindow 
Rock based solt^ly on the students* scores on the comprehension section of the 
Gates-McGinitie Reading Test, Moreover, the assignment to lists was done purely 
on the basis of grade level; i,e, if a student's comprehension score was below 
his grade level > he was placed on the "low" or LESA list, otherwise he vjas placed 
on the "high" or non-LESA list. 

Examining the data from Window Rock, it became immediately clear that the 
list information was not appropriate for our purposes. Consider Tables la and lb 
below. Tabic la shox^s the relationship betvjeen test score, in terms of total 
points correct, and grade level, while Table lb shows the relationship between 
list membership and grade level. 





la: 


Window 


Fock 


Children by grade, and 


test score. 








Test: 


total poiiits 




Grade 




0-30 




31-50 51-67 


Tota 1 


K-3 




7 




21 18 


46 


4-6 




1 




22 66 


89 


7-8 




0 




3 33 


36 


Table 


lb: 


Wind ow 


Rock 


Children by grade and 


school list 



List 

Grade LESA - below y^rad o Non-LESA at or above grade Total 

K-3 9 37 46 

4-6 54 35 89 

7-8 27 9 36 



r 



If test score is taken as tl^e measure of English proficiency. Table la 
supports the hypothesis that, by and large, the older cl^ildrcn know more English 
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than the yountjcr ones. This general pattern V7as replicated in all other sites, 
regardless of whether test score or list was used as a measure of English pro- 
ficiency. Hovjever, Table lb would indicate just the opposite: that the older 
the child is, the less he knows of English. This is a truly abberrant pattern 
given all of our data and what is known about second language acquisition. The 
problem characterized in Table lb, therefore, seeuus to be peculiar to reading 
and not to English proficiency. That is, it a}:pears that the \?indow Rock children 
rapidly fall behind nationally normed grade levels in reading comprehension as 
they grov7 older. However, the conclusion that this is due to a decrease in their 
English proficiency appears not to be tenable. 

On the basis of these data, we decided not to use the TsUndow Rock list 
information in deriving our scoring keys. Thus, when list was used as a criterion 
variable, only the data from Ganado were utili'/.ed. Of course, when test scores 
were the criterion measure, the data from all Navajo cb.ildren were couibined into 
a single sample. 
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Appendix 6 



^ I^egrcssion Analysis - Cliildron 

At an earlj'' stage in the analysis of the field test data, a scries of 
multiple regression analyses v?ere performed for both descriptive and analytical 
reasons. Later, however, it became clear that discriminant analysis vas more 

I to the point of this project and that the regression analyses added nothing to 

it* Thus, these analyses did not result in a scoring key. The basic results of 
the multiple regression analyses vill be briefly presented beloic for those \iho 
are accustomed to thinking about multi-variate prediction problems such as the 
present one in regression terms* 

Table 1 presents the regression analyses within each group and for all 
groups pooled using the ten MELP variables as predictors and FCTl^ as the criterion* 
Coefficients denoted as B are uns tandardi^cd while those denoted as (i are standard- 
ized. 
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Appendix 7 



1 



Regression Analysis - Adults 

The adult data were subjected to multiple regression analysis using the 
11 mW variables as predictors and FCTR (not dichotomized) as criterion. Table 1 
gives the regression analysis as performed within each ethnic group and across all 
groups . 
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Staff Utilization and Technical Consultants 
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